Hi all,
I’m testing two ConnectX-8 adapters with two auxiliary cards under an InfiniBand XDR environment.
Each main card and its auxiliary partner are connected in a dual-port configuration (theoretically 800 Gb/s total).
However, I’m observing the following issue during bandwidth tests using ib_write_bw:
-
When testing a single pair of ports (one main card ↔ one auxiliary), I can reach about 448 Gb/s, which looks normal for one x16 Gen5 link.
-
But when I run tests on both pairs simultaneously (dual-to-dual), the throughput of each pair drops to around 237 Gb/s, and the total bandwidth remains around 450 Gb/s in total — not doubled as expected.
The command output also shows:
CA ‘mlx5_0’
CA type: MT4131
Number of ports: 1
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0xcc40f303002f37ec
System image GUID: 0xcc40f303002f37dc
Port 1:
State: Active
Physical state: LinkUp
Rate: 800
Base lid: 20
LMC: 0
SM lid: 19
Capability mask: 0xa741ec48
Port GUID: 0xcc40f303002f37ec
Link layer: InfiniBand
CA ‘mlx5_1’
CA type: MT4131
Number of ports: 1
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0xcc40f303002f37dc
System image GUID: 0xcc40f303002f37dc
Port 1:
State: Active
Physical state: LinkUp
Rate: 800
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa751ec48
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
CA ‘mlx5_2’
CA type: MT4131
Number of ports: 1
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0x5000e6030005563a
System image GUID: 0x5000e6030005562a
Port 1:
State: Active
Physical state: LinkUp
Rate: 800
Base lid: 14
LMC: 0
SM lid: 19
Capability mask: 0xa741ec48
Port GUID: 0x5000e6030005563a
Link layer: InfiniBand
CA ‘mlx5_3’
CA type: MT4131
Number of ports: 1
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0x5000e6030005562a
System image GUID: 0x5000e6030005562a
Port 1:
State: Active
Physical state: LinkUp
Rate: 800
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa751ec48
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
CA ‘smi_test’
CA type: MT4131
Number of ports: 4
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0xcc40f303002f37dc
System image GUID: 0xcc40f303002f37dc
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa750e84a
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa750e84a
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
Port 3:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa750e84a
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
Port 4:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 19
LMC: 0
SM lid: 19
Capability mask: 0xa750e84a
Port GUID: 0xcc40f303002f37dc
Link layer: InfiniBand
CA ‘smi_test1’
CA type: MT4131
Number of ports: 4
Firmware version: 40.45.1200
Hardware version: 0
Node GUID: 0x5000e6030005562a
System image GUID: 0x5000e6030005562a
Port 1:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa750e848
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa750e848
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
Port 3:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa750e848
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
Port 4:
State: Active
Physical state: LinkUp
Rate: 200
Base lid: 10
LMC: 0
SM lid: 19
Capability mask: 0xa750e848
Port GUID: 0x5000e6030005562a
Link layer: InfiniBand
WARNING: BW peak won’t be measured in this run.
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_3
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON Lock-free : OFF
ibv_wr* API : ON Using DDP : ON
TX depth : 128
CQ Moderation : 1
CQE Poll Batch : 16
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
local address: LID 0x0a QPN 0x0026 PSN 0x257c65 RKey 0x1fffbd VAddr 0x007ffa1a6e1000
remote address: LID 0x13 QPN 0x0027 PSN 0xbe93bb RKey 0x2004bd VAddr 0x007fe073dbe000
bytes iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
65536 4288536 0.00 449.68 0.857690
65536 4287020 0.00 449.52 0.857393
65536 4283661 0.00 449.17 0.856733
65536 4282700 0.00 449.07 0.856538
65536 4282384 0.00 449.04 0.856473
65536 4281972 0.00 448.99 0.856384
65536 4280732 0.00 448.85 0.856115
65536 2280147 0.00 239.09 0.456022
65536 2262070 0.00 237.19 0.452409
65536 2262046 0.00 237.19 0.452407
65536 2262050 0.00 237.19 0.452408
65536 2262091 0.00 237.19 0.452411
65536 2262070 0.00 237.20 0.452414
65536 2262002 0.00 237.18 0.452393
65536 2261899 0.00 237.18 0.452380
65536 2261903 0.00 237.18 0.452379
65536 2261936 0.00 237.18 0.452388
65536 2261975 0.00 237.18 0.452392
65536 2261924 0.00 237.18 0.452379
65536 2261947 0.00 237.18 0.452389
65536 2261917 0.00 237.18 0.452381
65536 2261911 0.00 237.18 0.452382
All auxiliary cards have been configured according to the ConnectX-8 User Manual, including the SMI-based setup and device linking procedure.
This makes me suspect that both cards might be sharing the same PCIe root complex or uplink, so the total host bandwidth is limited to ~450 Gb/s even though the optical links are capable of 800 Gb/s.
Questions:
-
How can I confirm whether both ConnectX-8s (and their auxiliaries) are sharing the same PCIe x16 path or switch?
-
Is there any firmware or hardware configuration (e.g., PCIe bifurcation, topology mode, or mlxconfig parameter) that allows the full 800 Gb/s to be utilized when running dual-pair traffic simultaneously?
-
Are there recommended tools or settings (besides
numactl, which is not available on this host) to achieve full line-rate performance across both adapters?
Any suggestions or clarifications about how ConnectX-8 + auxiliary cards handle PCIe bandwidth aggregation would be greatly appreciated.
Thanks in advance!