ConnectX-8 + Auxiliary Cards — Can’t Reach Full 800 Gb/s in IB XDR Link Test

Hi all,

I’m testing two ConnectX-8 adapters with two auxiliary cards under an InfiniBand XDR environment.
Each main card and its auxiliary partner are connected in a dual-port configuration (theoretically 800 Gb/s total).

However, I’m observing the following issue during bandwidth tests using ib_write_bw:

  • When testing a single pair of ports (one main card ↔ one auxiliary), I can reach about 448 Gb/s, which looks normal for one x16 Gen5 link.

  • But when I run tests on both pairs simultaneously (dual-to-dual), the throughput of each pair drops to around 237 Gb/s, and the total bandwidth remains around 450 Gb/s in total — not doubled as expected.

The command output also shows:

CA ‘mlx5_0’
CA type: MT4131
Number of ports: 1
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0xcc40f303002f37ec
System image GUID: 0xcc40f303002f37dc
Port 1:
State: Active
Physical state: LinkUp
Rate: 800
Base lid: 20
LMC: 0
SM lid: 19
Capability mask: 0xa741ec48
Port GUID: 0xcc40f303002f37ec
Link layer: InfiniBand
CA ‘mlx5_1’
CA type: MT4131
Number of ports: 1
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0xcc40f303002f37dc
System image GUID: 0xcc40f303002f37dc
Port 1:
State: Active
Physical state: LinkUp
Rate: 800
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa751ec48
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
CA ‘mlx5_2’
CA type: MT4131
Number of ports: 1
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0x5000e6030005563a
System image GUID: 0x5000e6030005562a
Port 1:
State: Active
Physical state: LinkUp
Rate: 800
Base lid: 14
LMC: 0
SM lid: 19
Capability mask: 0xa741ec48
Port GUID: 0x5000e6030005563a
Link layer: InfiniBand
CA ‘mlx5_3’
CA type: MT4131
Number of ports: 1
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0x5000e6030005562a
System image GUID: 0x5000e6030005562a
Port 1:
State: Active
Physical state: LinkUp
Rate: 800
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa751ec48
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
CA ‘smi_test’
CA type: MT4131
Number of ports: 4
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0xcc40f303002f37dc
System image GUID: 0xcc40f303002f37dc
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa750e84a
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa750e84a
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
Port 3:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa750e84a
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
Port 4:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa750e84a
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
CA ‘smi_test1’
CA type: MT4131
Number of ports: 4
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0x5000e6030005562a
System image GUID: 0x5000e6030005562a
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa750e848
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa750e848
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
Port 3:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa750e848
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
Port 4:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa750e848
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand

WARNING: BW peak won’t be measured in this run.

                RDMA_Write BW Test

Dual-port : OFF Device : mlx5_3

Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON Lock-free : OFF
ibv_wr* API : ON Using DDP : ON
TX depth : 128
CQ Moderation : 1
CQE Poll Batch : 16
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet

local address: LID 0x0a QPN 0x0026 PSN 0x257c65 RKey 0x1fffbd VAddr 0x007ffa1a6e1000

remote address: LID 0x13 QPN 0x0027 PSN 0xbe93bb RKey 0x2004bd VAddr 0x007fe073dbe000

bytes iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
65536 4288536 0.00 449.68 0.857690
65536 4287020 0.00 449.52 0.857393
65536 4283661 0.00 449.17 0.856733
65536 4282700 0.00 449.07 0.856538
65536 4282384 0.00 449.04 0.856473
65536 4281972 0.00 448.99 0.856384
65536 4280732 0.00 448.85 0.856115
65536 2280147 0.00 239.09 0.456022
65536 2262070 0.00 237.19 0.452409
65536 2262046 0.00 237.19 0.452407
65536 2262050 0.00 237.19 0.452408
65536 2262091 0.00 237.19 0.452411
65536 2262070 0.00 237.20 0.452414
65536 2262002 0.00 237.18 0.452393
65536 2261899 0.00 237.18 0.452380
65536 2261903 0.00 237.18 0.452379
65536 2261936 0.00 237.18 0.452388
65536 2261975 0.00 237.18 0.452392
65536 2261924 0.00 237.18 0.452379
65536 2261947 0.00 237.18 0.452389
65536 2261917 0.00 237.18 0.452381
65536 2261911 0.00 237.18 0.452382

All auxiliary cards have been configured according to the ConnectX-8 User Manual, including the SMI-based setup and device linking procedure.

This makes me suspect that both cards might be sharing the same PCIe root complex or uplink, so the total host bandwidth is limited to ~450 Gb/s even though the optical links are capable of 800 Gb/s.

Questions:

  1. How can I confirm whether both ConnectX-8s (and their auxiliaries) are sharing the same PCIe x16 path or switch?

  2. Is there any firmware or hardware configuration (e.g., PCIe bifurcation, topology mode, or mlxconfig parameter) that allows the full 800 Gb/s to be utilized when running dual-pair traffic simultaneously?

  3. Are there recommended tools or settings (besides numactl, which is not available on this host) to achieve full line-rate performance across both adapters?

Any suggestions or clarifications about how ConnectX-8 + auxiliary cards handle PCIe bandwidth aggregation would be greatly appreciated.

Thanks in advance!

Dear Customer,

Thank you for reaching out to the NVIDIA Community.

Performance degradation can occur due to a variety of factors depending on the specifics of each case. To help us investigate further, please submit a technical case after collecting sysinfo-snapshot files from both servers and an ibdiagnet file. Use the following commands to gather the necessary data:

To capture a sysinfo snapshot (this tool is included by default with the MLNX OFED driver), run:
sysinfo-snapshot.py

To collect ibdiagnet data, run this command on any server in your network and send us all generated files:
ibdiagnet --sc --extended_speeds all -P all=1 --pm_per_lane --get_cable_info -w ibdiagnet2.topo --cable_info_disconnected --get_phy_info --routing --sharp --phy_cable_disconnected --rail_validation --get_p_info

Note: If you receive a message that certain parameters are not supported, please remove those parameters and try running the command again.

After collecting the data, package the generated files with:
tar -czvf ibdiagnet2_$(date +%Y-%m-%d_%H).tar.gz /var/tmp/ibdiagnet2/*

You can submit your service request online at any time through the NVIDIA Enterprise Support Portal, or send your technical service request via email to the NVIDIA technical support mailbox: EnterpriseSupport@nvidia.com.

Thank you,
Mellanox Technical Support