InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0) vs InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a20)

Hi Guys

I really need some help here, I’m setting up a small cluster and decided instead of using 10gig ethernet I’ll play with infiniband

SO here I am at step one with 3 * Cisco SFS-HCA-250-A1 cards .( Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0) ) and 2 * Mellanox CX4 Infiniband Dual Port 10GB Network Card - MN: MHGA28-XTC .( Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a20) )

It turns out all the cards are the same model as well as hardware revision of A3.

I’ve updated all the cards to firmware version 5.3.

My issue is that the Mellanox CX4’s are working perfectly but I cannot for the life of me get the Cisco Variants working. What is the difference between the 2 cards ultimately ? and what else do I ned to change on the Cisco branded cards to get them to work like their Mellanox C4 twins.

It cannot be anything else but hardware / firmware related as I can swop out cards from the cisco to the mellanox variant and everything works perfectly. When I run a query with mstflint I pretty much get the same result from both card variants as shown below besides those wiered “▒ڭ” symbols prefixing Board Id and VSD fields which are easily changed during a firmware update if required.

root@proxmox1:~# mstflint -d 0c:000.0 q

Image type: Failsafe

FW Version: 5.3.0

I.S. Version: 1

Device ID: 25218

Chip Revision: A0

Description: Node Port1 Port2 Sys image

GUIDs: 0005ad00000c1464 0005ad00000c1465 0005ad00000c1466 0005ad00000c1467

Board ID: ▒ڭ (MT_0370130002)

VSD: ▒ڭ

PSID: MT_0370130002

Below is the errors I get when running dmesg | grep ib .

root@proxmox1:~# dmesg | grep ib

[ 0.000000] tsc: Fast TSC calibration using PIT

[ 0.001006] Calibrating delay loop (skipped), value calculated using timer frequency… 5320.09 BogoMIPS (lpj=2660045)

[ 0.276638] vgaarb: bridge control possible 0000:10:0d.0

[ 0.677246] libphy: Fixed MDIO Bus: probed

[ 0.746569] libata version 3.00 loaded.

[ 1.596026] tsc: Refined TSC clocksource calibration: 2659.999 MHz

[ 5.220277] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)

[ 5.220280] ib_mthca: Initializing 0000:0c:00.0

[ 5.406654] [drm] ib test succeeded in 0 usecs

[ 6.224473] ib_mthca 0000:0c:00.0: RUN_FW command returned -22, aborting.

[ 6.224624] ib_mthca 0000:0c:00.0: Loading FW returned -22, aborting.

[ 6.224852] ib_mthca: probe of 0000:0c:00.0 failed with error -22

[ 14.335250] netlink: Unknown key attribute (type=20, max=19).

Anybody’s contribution would be greatly appreciated as I really do feel like I’m very close to resolving the issue but I just need to understand the differences between the cisco and mellanox cards to close this off.

Warm Regards

Shaun

Here are pics of the back of each card just for sanity purposes.

mellanox cx4

Cisco SFS-HCA-250-A1

Ahh cool. That makes sense.

HI Justin

I removed the DDR serdes section from the default config file…

On Fri, Jan 16, 2015 at 12:00 AM, justinclift <community@mellanox.com mailto:community@mellanox.com >

Hey all,

Just to let everyone know and to close this one off I resolved the issue.

Originally I flashed both the Mellanox cx4 and the cisco variants with the same firmware for cards with PID MT_0370130002 using flint. After flashing all cards were are firmware verison 5.3. The Mellanox CX4’s worked but the Ciscos did not.

TO resolve it I used mlxburn with default 5.3 firmware and used a slightly different config file (I wont go into detail around this) to generate the .bin firmware file.

I then flashed the cards with flint using the new firmware and the cards are working now

I’ve attached a link the firmware to this thread if anyone ever needs it .

Dropbox - cisco-sfs_HCS_250_Infini.bin Dropbox - cisco-sfs_HCS_250_Infini.bin - Simplify your life

Out of curiosity, which config settings did you change?