I really need some help here, I’m setting up a small cluster and decided instead of using 10gig ethernet I’ll play with infiniband
SO here I am at step one with 3 * Cisco SFS-HCA-250-A1 cards .( Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a0) ) and 2 * Mellanox CX4 Infiniband Dual Port 10GB Network Card - MN: MHGA28-XTC .( Mellanox Technologies MT25208 [InfiniHost III Ex] (rev a20) )
It turns out all the cards are the same model as well as hardware revision of A3.
I’ve updated all the cards to firmware version 5.3.
My issue is that the Mellanox CX4’s are working perfectly but I cannot for the life of me get the Cisco Variants working. What is the difference between the 2 cards ultimately ? and what else do I ned to change on the Cisco branded cards to get them to work like their Mellanox C4 twins.
It cannot be anything else but hardware / firmware related as I can swop out cards from the cisco to the mellanox variant and everything works perfectly. When I run a query with mstflint I pretty much get the same result from both card variants as shown below besides those wiered “▒ڭ” symbols prefixing Board Id and VSD fields which are easily changed during a firmware update if required.
root@proxmox1:~# mstflint -d 0c:000.0 q
Image type: Failsafe
FW Version: 5.3.0
I.S. Version: 1
Device ID: 25218
Chip Revision: A0
Description: Node Port1 Port2 Sys image
GUIDs: 0005ad00000c1464 0005ad00000c1465 0005ad00000c1466 0005ad00000c1467
Board ID: ▒ڭ (MT_0370130002)
Below is the errors I get when running dmesg | grep ib .
root@proxmox1:~# dmesg | grep ib
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.001006] Calibrating delay loop (skipped), value calculated using timer frequency… 5320.09 BogoMIPS (lpj=2660045)
[ 0.276638] vgaarb: bridge control possible 0000:10:0d.0
[ 0.677246] libphy: Fixed MDIO Bus: probed
[ 0.746569] libata version 3.00 loaded.
[ 1.596026] tsc: Refined TSC clocksource calibration: 2659.999 MHz
[ 5.220277] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)
[ 5.220280] ib_mthca: Initializing 0000:0c:00.0
[ 5.406654] [drm] ib test succeeded in 0 usecs
[ 6.224473] ib_mthca 0000:0c:00.0: RUN_FW command returned -22, aborting.
[ 6.224624] ib_mthca 0000:0c:00.0: Loading FW returned -22, aborting.
[ 6.224852] ib_mthca: probe of 0000:0c:00.0 failed with error -22
[ 14.335250] netlink: Unknown key attribute (type=20, max=19).
Anybody’s contribution would be greatly appreciated as I really do feel like I’m very close to resolving the issue but I just need to understand the differences between the cisco and mellanox cards to close this off.
Here are pics of the back of each card just for sanity purposes.