Hi,
I am trying to connect two ConnectX-8’s back to back over two different servers. The link comes up and negotiates the speed, but the state never leaves initializing.
Here is the link and I am using an infiniband specific cable OSFPFL-400G-PC01:
sudo mlxlink -d /dev/mst/mt4131_pciconf0
Operational Info
State : Active
Physical state : N/A
Speed : IB-NDR
Width : 4x
FEC : Interleaved_Standard_RS_FEC_PLR - (544,514)
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
Enabled Link Speed : 0x000000c1 (NDR,HDR,SDR)
Supported Cable Speed : 0x000000f1 (NDR,HDR,EDR,FDR,SDR)
Troubleshooting Info
Status Opcode : 0
Group Opcode : N/A
Recommendation : No issue was observed
Tool Information
Firmware Version : 40.47.1088
amBER Version : 5.75
MFT Version : 4.34.1-10
Status Opcode : 0
Group Opcode : N/A
Recommendation : No issue was observed
Tool Information
Firmware Version : 40.47.1088
amBER Version : 5.75
MFT Version : 4.34.1-10
here is ibstat:
Troubleshooting Infoibstat
CA ‘mlx5_0’
CA type: MT4131
Number of ports: 1
Firmware version: 40.47.1088
Hardware version: 0
Node GUID: 0xXXXXXXXXde4
System image GUID: 0xXXXXXXXde4
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 400
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0XXXXXc48
Port GUID: 0xXXXXXXXde4
Link layer: InfiniBand
ibstat of the other card:
CA ‘mlx5_0’
CA type: MT4131
Number of ports: 1
Firmware version: 40.47.1088
Hardware version: 0
Node GUID: 0xXXXXXXX76a
System image GUID: 0xXXXXXXX76a
Port 1:
State: Initializing
Physical state: LinkUp
Rate: 400
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0xXXXXXXc48
Port GUID: 0xXXXXXXX76a
Link layer: InfiniBand
when i try to open opensm:
Feb 04 19:09:33 529433 [C74A0740] 0x03 → OpenSM 5.25.1.MLNX20251030.e3791a47
Feb 04 19:09:33 529491 [C74A0740] 0x80 → OpenSM 5.25.1.MLNX20251030.e3791a47
Feb 04 19:09:33 535886 [C74A0740] 0x02 → osm_vendor_init: 1000 pending umads specified
Feb 04 19:09:33 535981 [C74A0740] 0x02 → osm_vendor_init: 1000 pending umads specified
Feb 04 19:09:33 536040 [C74A0740] 0x02 → osm_vendor_init: 1000 pending umads specified
Feb 04 19:09:33 554883 [C74A0740] 0x02 → osm_tenant_mgr_init: tenant mgr is disabled
Feb 04 19:09:33 555039 [C74A0740] 0x80 → Entering DISCOVERING state
Feb 04 19:09:33 555201 [C74A0740] 0x02 → osm_issu_mgr_init: issu_mgr is initialized
Feb 04 19:09:33 555421 [C74A0740] 0x02 → osm_vendor_rebind: Mgmt class 0x81 binding to port GUID 0x90e3170300f0bde4
Feb 04 19:09:33 566448 [C74A0740] 0x01 → osm_vendor_rebind: ERR 5424: Unable to open port 0x90e3170300f0bde4
Feb 04 19:09:33 566466 [C74A0740] 0x01 → osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
Feb 04 19:09:33 566473 [C74A0740] 0x01 → osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR) for port guid 0xXXXXXXXde4, port index 0
Feb 04 19:09:33 572124 [C74A0740] 0x02 → osm_tenant_mgr_destroy: osm_tenant_mgr_destroy complete
Feb 04 19:09:33 572153 [C74A0740] 0x02 → osm_issu_mgr_destroy: osm_issu_mgr_destroy complete
Feb 04 19:09:33 572245 [C74A0740] 0x80 → Exiting SM
I have also tried different configurations, config 1, 2, and 5: https://docs.nvidia.com/networking/display/nvidia-connectx-8-supernic-user-manual.pdf