Currently, we are using two Mellanox 100G VPI NIC[MCX653106A-ECAT]
We realized that under the Infiniband connection. Our RDMA Read/Write could not saturate the 100Gbps, but only to ~77Gbps. as shown below:
We use ib_read_bw /ib_write_bw for testing. Interestingly, when we tried the same physical configuration but under the RoCE case, the throughput could go 93Gbps, as shown below:
RDMA source server has PCIE Gen4, x16
RDMA request client has PCIE Gen3, x16.
We have tried switching client and server in reverse but the throughput also stuck at 77Gbps. We also tried perf, but the same result.
Our cable supports EDR. One of the mlxlink info is shown below:
CA: janux-03 HCA-2: 0xb83fd20300595d01 2 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 1 1[ ] "janux-spr1 HCA-2" ( Could be 53.125 Gbps) CA: janux-spr1 HCA-2: 0xb83fd20300595fe1 1 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 2 1[ ] "janux-03 HCA-2" ( Could be 53.125 Gbps)
Operational Info ---------------- State : Active Physical state : LinkUp Speed : IB-EDR Width : 4x FEC : Standard LL RS-FEC - RS(271,257) Loopback Mode : No Loopback Auto Negotiation : ON Supported Info -------------- Enabled Link Speed : 0x00000035 (EDR,FDR,QDR,SDR) Supported Cable Speed : 0x0000003f (EDR,FDR,FDR10,QDR,DDR,SDR) Troubleshooting Info -------------------- Status Opcode : 0 Group Opcode : N/A Recommendation : No issue was observed Tool Information ---------------- Firmware Version : 20.36.1010 amBER Version : 2.09 MFT Version : mft 4.23.1-7
We update both NICs to firmware 20.36.1010.
The servers are directly connected to the client with NO switches or components in the middle.
We already tried every method we could find within the community and internet.
Please offer help if possible. Appreciate your help very much.