Hello NVIDIA community!
We are trying to achieve 100 Gb/s per port on a server using the NVIDIA Mellanox ConnectX-5 (MCX556A-ECAT) adapter. Below is our current setup:
Environment:
- Adapter: ConnectX-5 Dual-Port PCIe Gen3 x16 (MCX556A-ECAT)
- Cable: HDR100 breakout DAC – CBL-MCP7H50-H002R26 (QSFP56 to 2x HDR100)
- Switch: Mellanox Quantum MQM8790 (unmanaged), with ports set to SPLIT_2X mode
- Driver version: MLNX_OFED 5.8-3.0.7.0
- OS: Oracle Linux 8.8
- Kernel: 4.18.0-477.21.1.el8_8.x86_64
Problem:
Even with the physical connections apparently correct, the InfiniBand interfaces on the server are limited to 25 Gb/s (1x EDR), as confirmed by ibstatus, ibv_devinfo, and ibdiagnet.
The HDR100 breakout cable is properly connected: the main end is plugged into a 200 Gb/s switch port (configured with SPLIT_2X), and each leg is connected to a separate port on the ConnectX-5 adapter. Both interfaces are up and functional.
Evidence:
ibstatus
[root@node01 ~]# ibstatus
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:e8eb:d303:00a9:3c42
base lid: 0x6d
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 25 Gb/sec (1X EDR)
link_layer: InfiniBand
Infiniband device 'mlx5_1' port 1 status:
default gid: fe80:0000:0000:0000:e8eb:d303:00a9:3c43
base lid: 0x71
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 25 Gb/sec (1X EDR)
link_layer: InfiniBand
[root@node01 ~]#
ibv_devinfo
[root@node01 ~]# ibv_devinfo
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.32.1010
node_guid: e8eb:d303:00a9:3c42
sys_image_guid: e8eb:d303:00a9:3c42
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 109
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 16.32.1010
node_guid: e8eb:d303:00a9:3c43
sys_image_guid: e8eb:d303:00a9:3c42
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 113
port_lmc: 0x00
link_layer: InfiniBand
[root@node01 ~]#
mlxconfig
[root@node01 ~]# mlxconfig -d /dev/mst/SW_MT54000_Quantum_Mellanox_Technologies_lid-0x004a q
Device #1:
----------
Device type: Quantum
Name: MQM8790-HS2X_Ax
Description: Mellanox Quantum(TM) HDR InfiniBand Switch; 40 QSFP56 ports; 2 Power Supplies (AC); unmanaged; standard depth; P2C airflow; Rail Kit; RoHS6
Device: /dev/mst/SW_MT54000_Quantum_Mellanox_Technologies_lid-0x004a
Configurations: Next Boot
SPLIT_MODE SPLIT_2X(1)
DISABLE_AUTO_SPLIT ENABLE_AUTO_SPLIT(0)
SPLIT_PORT Array[1..64]
GB_VECTOR_LENGTH 0
GB_UPDATE_MODE ALL(0)
GB_VECTOR Array[0..7]
[root@node01 ~]#
ibnetdiscover
[root@node01 ~]# ibnetdiscover | grep -i node01
[55] "H-e8ebd30300a93c42"[1](e8ebd30300a93c42) # "node01 HCA-1" lid 109 1xEDR
[56] "H-e8ebd30300a93c43"[1](e8ebd30300a93c43) # "node01 HCA-2" lid 113 1xEDR
Ca 1 "H-e8ebd30300a93c43" # "node01 HCA-2"
Ca 1 "H-e8ebd30300a93c42" # "node01 HCA-1"
[root@node01 ~]#
Switch-side link status:
Port 55 -> node01 HCA-1 : 1X 25.78125 Gbps Active
Port 56 -> node01 HCA-2 : 1X 25.78125 Gbps Active
[root@node01 ~]# mlxconfig -d /dev/mst/SW_MT54000_Quantum_Mellanox_Technologies_lid-0x004a q SPLIT_PORT[55..56]
Device #1:
----------
Device type: Quantum
Name: MQM8790-HS2X_Ax
Description: Mellanox Quantum(TM) HDR InfiniBand Switch; 40 QSFP56 ports; 2 Power Supplies (AC); unmanaged; standard depth; P2C airflow; Rail Kit; RoHS6
Device: /dev/mst/SW_MT54000_Quantum_Mellanox_Technologies_lid-0x004a
Configurations: Next Boot
SPLIT_PORT[55] NO_SPLIT(0)
SPLIT_PORT[56] NO_SPLIT(0)
[root@node01 ~]#
Physical port the cable is connected to (switch IB port 28):
[root@node01 ~]# mlxconfig -d /dev/mst/SW_MT54000_Quantum_Mellanox_Technologies_lid-0x004a q SPLIT_PORT[28]
Device #1:
----------
Device type: Quantum
Name: MQM8790-HS2X_Ax
Description: Mellanox Quantum(TM) HDR InfiniBand Switch; 40 QSFP56 ports; 2 Power Supplies (AC); unmanaged; standard depth; P2C airflow; Rail Kit; RoHS6
Device: /dev/mst/SW_MT54000_Quantum_Mellanox_Technologies_lid-0x004a
Configurations: Next Boot
SPLIT_PORT[28] NO_SPLIT(0)
[root@node01 ~]#
Note: I tried to solve it by updating the OFED driver to version 5.8-6.0.4.2, but nothing changed.
Questions:
- Does the fallback to 1x EDR / 25 Gb/s result from a limitation in PAM4 support or signal compatibility?
- The ConnectX-5 works correctly with QSFP28 100G DAC passive cables (non-breakout)?
- Why doesn’t it achieve the same with the HDR breakout cable, even in SPLIT_2X mode?
- Are there any known compatibility constraints or design limitations regarding breakout cables with the ConnectX-5 MCX556A-ECAT?
Any suggestions or similar experience would be greatly appreciated.
Best regards,
Marcos Melo.