MHGH28-XTC not working

all right, it could be some really old FW on that card. let’s try something different:

try to burn using mlxburn tool (also part of the MFT package) while specifying the device type:

mlxburn -dev_type 25418 -dev -image

the complete manual for the MFT tools is at: http://www.mellanox.com/pdf/MFT/MFT_user_manual.pdf http://www.mellanox.com/pdf/MFT/MFT_user_manual.pdf

Give it a try; let me know how things go…

I tried that, here is the result. The .bin file is in the same folder. The cards are all rev A3

Ok, now I see what ;unhappy with driver support’ means .

There appears to be no SRP driver in the Windows 2012 package . That is a fair sized blow to my plans. Oh well, back to Windows 2008r2 then.

Not 100% sure I did this right

C:\Users\Administrator>cd C:\Program Files\Mellanox\WinMFT

C:\Program Files\Mellanox\WinMFT>mst status

MST devices:


mt25418_pciconf0

mt25418_pci_cr0

C:\Program Files\Mellanox\WinMFT>mlxburn -dev_type 25418 -dev mt25418_pci_cr0 -i

mage fw-25408-2_9_1000-MHGH28-XTC_A2-A3.bin

-E- Read a corrupted device id (0xffff). Probably HW/PCI access problem

-E- Can not open mt25418_pci_cr0: MFE_CR_ERROR

-E- Image burn failed: child process exited abnormally

C:\Program Files\Mellanox\WinMFT>

Here is a newbie question, should it make a difference at this point if the card is connected to a switch (It is not at the moment)

Do I really need a switch if I’m going from server A to server B via cable?

no… no need to connect to a switch in order to burn the FW.

try with the other MST device please (mt25418_pciconf0)

Also… Look at toddh post above. he was able to use the flint tool with -nofs flag and get by

no. Opensm is not a requirement for flashing FW but keep it in mind for later on after you are able to get a working set of cards.

I tried it on 2 different machines in 6 different pcie slots with 5 os’s

(7, 8, 2008r2. 2012 and Ubuntu) sadly my ubuntu skills are depressing.

I was able to update the card using a command given to me yesterday and I

was able to update the firmware. It was at 2.6.0 now all three are at

2.9.1000 The command that I was using was giving me a HW error so I wasn’t

sure what was going on.

Marc,

that’s good news - you were able to update the FW. well done.

how things look afterwards?

regarding OS’s support: Windows - shouldn’t be a problem with all the above you mentioned. Mellanox has drivers available for download on the web site. in fact, 2012 server comes with “inbox” driver (which means, you don’t need to download anything).

as for Linux - Ubuntu (and all other debian flavors) driver is not there yet. it is coming soon. should be any problem with working with all RH, CentOS and SUSE releases.

good luck my friend.

Yeah there was a lot of progress last night. The actual command that fixed it was

mlxburn -dev_type 25418 -dev mt25418_pciconf0 -image fw-25408-2_9_1000-MHGH28-XTC_A2-A3.bin

I’m waiting for the cables to arrive (might be a few days still, will try and borrow some from work). I was able to assign an IP and ping it, so that stack is at least working. Can’t wait to get the cables now. I am running a Hyper-V server and will give CentOS a whirl

Just to save troubles down the road - RH/CentOS 6.4 is very new. Mellanox didn’t release a driver for IB yet for this kernel. it will probably get released in few months.

you can:

  • switch over to 6.3 or

  • continue with 6.4 and work with with the inbox driver (which should be fine i guess).

i will mark the post about the mlxburn as the correct answer.

Cheers…

Thanks Justin.

I think the most frustrating part was seeing the error in windows and not being able to update the firmware when everything indicated it was a fw issue.

I’m in the process of rebuilding my servers and installing them in rack mounts (part of the man cave) so it might be a little while before I get the time to tinker with CentOS. I’ll keep your offer in mind though.

Would anyone be able to list that last post of mine with the mlxburn command as a valid answer?

Just saw this post. I picked up some HP 448397-B21 cards at one point that were MHGH28-XTC and was getting errors like this. I used the flint command as follows. The -nofs allows burning without certain failsafes which solved my errors.

flint -d mt25418_pciconf0 -i fw-25408-2_9_1000-MHGH28-XTC_A2-A3.bin -nofs -allow_psid_change burn

You will see the following

Current FW version on flash: 2.6.0

New FW version: 2.9.1000

You are about to replace current PSID on flash - “HP_09D0000001” with a different PSID - “MT_04A0120002”.

Note: It is highly recommended not to change the PSID.

Do you want to continue ? (y/n) [n] : y

Burn process will not be failsafe. No checks will be performed.

ALL flash, including the Invariant Sector will be overwritten.

If this process fails, computer may remain in an inoperable state.

Do you want to continue ? (y/n) [n] : y

Not wishing to hijack but I have noticed an bigger issue with this card with a new Windows install since I have received a few (yep from EBay).

Note: my card seems to be an A1.

On installing the card on my server, Windows Server 2012 Standard reports a conflicting drivers issue (standard Windows drivers). I then updated with the Mellanox package but this resulted in the same issue (uninstalling and deleting the old driver made no difference).

I decided to put a fresh install on my server to clear out anything that may be causing issues (was about time and this is a home setup).

On installing Windows Server 2012 Essentials the server went through all the usual install steps and reboots until just before it would let you finally login fro the first time and then it reported ‘An install error has occured’ with the only option being to save the log file and shutdown. You can see the system sitting behind the install screen but cannot access it. I tried a second time from fresh drives to make sure and the same thing happened again.

I then installed ESXi 5.1 and installed Win Server 2012 Essentials as a VM on top. Everything was fine unitl I used VT-d to passthrough the Mellanox controller to the VM. After this the VM would not start. It would hang at the Windows loading screen and then just revert to a power off state.

I finally pulled the card and installed Win Server 2012 Ess on to the bare server again and it installed without an issue. Tonight I will be trying to put the card back in and seeing what I can do.

The reason for wanting Win Server is that I need OpenSM running and do not intend to have any Linux environments with direct control of a Infiniband HCA as they will all be in VMs running on ESXi (whos drivers also don’t have a subnet manager).

The Server with Windows is a HP ML110G7 so pretty standard hardware. The cards seem to be fine on my ESXi 5.1 servers, as much as I can tell without a subnet manager running. I suspect the firmware on the card is the problem (fingers crossed) but have not been able to verify yet.

The info on flashing the card in this thread will most likely be very helpful. Thanks

mblanke Infrastructure & Networking - NVIDIA Developer Forums - Btw, I think you might be the only one able to mark your post with the working mlxburn command as the answer (other than admin persons I guess).

Ok, all the error messages have gone away after changing the A1 (flashed) card to the A2 (unflashed) in the machine on port 2. It was also able to pickup the SRP target straight away again.

The port 5 machine is not on so I am guessing that is why there are no errors from that A1 card. This is the machine with Windows 2012 Ess on it that will not work with SRP. I have now put Windows in a VM but the underlying ESXi also cannot see the SAN SRP target. The A1 card would not work at all with Windows unless it was flashed.

Hopefully there is a little tweeking that can be done with one of the firmware tools to correct this issue but I have no idea what that may be .

yairi Infrastructure & Networking - NVIDIA Developer Forums rimblock Infrastructure & Networking - NVIDIA Developer Forums

Sorry for joining in late i’ve been pre-occupied.

I have the cards “working” as in the firmware and drivers are recognized. I have 2 servers running win 2008r2 and a win7 x64

The DC is configured with opensm and I can use it to connect one of two ports to either machine, however the other port of the server doesn’t connect. Do I have to run opensm on each port ?

DC port 1 to RAID port 1

DC port 2 to Win7 port 1

RAID port 2 to Win7 port 2

is the way I have it wired, DC ports will work one at a time if I disconnect the other one turns on.

any suggestions?

That’s how i’m setting it up. I was just trying to help out rimblock Infrastructure & Networking - NVIDIA Developer Forums as he seems to be having the same issues as I was having. rimblock Infrastructure & Networking - NVIDIA Developer Forums Are you getting an error code 10?

check if the card is on a X8 PCI-e slot.

Hi,

The mst status command responds with…

MST devices:


mt25418_pciconf0

mt25418_pci_cr0

C:\Program Files\Mellanox\WinMFT>mlxburn -dev_type 25418 -dev mt25418_pci_cr0 -i

mage fw-25408-2_9_1000-MHGH28-XTC_A1.bin

-E- Read a corrupted device id (0xffff). Probably HW/PCI access problem

-E- Can not open mt25418_pci_cr0: MFE_CR_ERROR

-E- Image burn failed: child process exited abnormally

Same as you had

Trying Todds flint command…

C:\Program Files\Mellanox\WinMFT>flint -d mt25418_pciconf0 -i fw-25408-2_9_1000-

MHGH28-XTC_A1.bin -nofs -allow_psid_change burn

Current FW version on flash: 2.6.0

New FW version: 2.9.1000

You are about to replace current PSID on flash - “HP_09D0000001” with a diff

erent PSID - “MT_04A0110002”.

Note: It is highly recommended not to change the PSID.

Do you want to continue ? (y/n) [n] : y

Burn process will not be failsafe. No checks will be performed.

ALL flash, including the Invariant Sector will be overwritten.

If this process fails, computer may remain in an inoperable state.

Do you want to continue ? (y/n) [n] : y

Burning FW image without signatures - OK

Restoring signature - OK

C:\Program Files\Mellanox\WinMFT>

Now I am seeing

“Insufficient system resources exist to complete the API.” in the device manager.

Trying for a reboot.