Hey,
I’m currently working on a simple setup of two systems connected via two ConnectX-4 Lx NICs and a fiber-optics cable with SFP28 modules on both ends. We are aiming to utilize GPUDirect RDMA, whereas only one system has a GDR supported GPU (Here: RTX A5000). Currently I’m looking for advices on how to improve the RoCE throughput, because we expect to get ~ 25 GBit/s but instead we were only able to achieve an average of 14.35 GBit/s (reported by ib_send_bw
).
The setup looks as follows:
- System #1:
CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores
GPU: NVIDIA RTX A5000
NIC: ConnectX-4 Lx - System #2:
CPU: AMD Ryzen 9 5900X 12-Core Processor
GPU: NVIDIA GeForce GTX 1650 (UUID: GPU-1e40bc5f-5675-d381-6ed0-ec9c0b990820)
NIC: ConnectX-4 Lx
Used fiber-optics cable: https://www.fs.com/de/products/40233.html?attribute=803&id=18479
All NICs are connected to PCIe 3.0 x16 (each Port has x8 which equals 64 GBit/s per Port), so I would expect that is not the limiting factor. Following commands have been run by us:
System #1 (Server):
$ ib_send_bw -F -d mlx5_0 -a --report_gbits
System #2 (Client):
$ ib_send_bw -F -d mlx5_0 -a 20.4.3.219 --report_gbits
This is the report by ib_send_bw
:
---------------------------------------------------------------------------------------
Send BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : ON
RX depth : 512
CQ Moderation : 100
Mtu : 4096[B] (EDIT: Use "ifconfig enpX mtu 9000" to replicate the mtu size)
Link type : Ethernet
GID index : 5
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x0092 PSN 0x52d594
GID: 00:00:00:00:00:00:00:00:00:00:255:255:20:04:03:219
remote address: LID 0000 QPN 0x0093 PSN 0x27fd9c
GID: 00:00:00:00:00:00:00:00:00:00:255:255:20:04:03:220
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
2 1000 0.000000 0.054278 3.392360
4 1000 0.000000 0.094776 2.961734
8 1000 0.00 0.22 3.407039
16 1000 0.00 0.44 3.402634
32 1000 0.00 0.87 3.404835
64 1000 0.00 1.76 3.434656
128 1000 0.00 3.49 3.408432
256 1000 0.00 6.96 3.397778
512 1000 0.00 12.31 3.004447
1024 1000 0.00 13.25 1.617233
2048 1000 0.00 13.77 0.840626
4096 1000 0.00 14.03 0.428310
8192 1000 0.00 14.21 0.216778
16384 1000 0.00 14.24 0.108639
32768 1000 0.00 14.31 0.054605
65536 1000 0.00 14.33 0.027337
131072 1000 0.00 14.34 0.013677
262144 1000 0.00 14.35 0.006841
524288 1000 0.00 14.35 0.003421
1048576 1000 0.00 14.35 0.001711
2097152 1000 0.00 14.35 0.000855
4194304 1000 0.00 14.35 0.000428
8388608 1000 0.00 14.35 0.000214
---------------------------------------------------------------------------------------
Other configuration informations:
$ ibv_devinfo
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 14.32.1010
node_guid: 08c0:eb03:00cb:9382
sys_image_guid: 08c0:eb03:00cb:9382
vendor_id: 0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id: MT_2420110034
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
$ ibdev2netdev
mlx5_0 port 1 ==> enp5s0f0np0 (Up)
mlx5_1 port 1 ==> enp5s0f1np1 (Down)
$ ethtool enp5s0f0np0
Settings for enp5s0f0np0:
Supported ports: [ FIBRE ]
Supported link modes: 1000baseKX/Full
10000baseKR/Full
25000baseCR/Full
25000baseKR/Full
25000baseSR/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None BaseR RS
Advertised link modes: 1000baseKX/Full
10000baseKR/Full
25000baseCR/Full
25000baseKR/Full
25000baseSR/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: RS
Link partner advertised link modes: Not reported
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 25000Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Cannot get wake-on-lan settings: Operation not permitted
Current message level: 0x00000004 (4)
link
Link detected: yes
Is there anything else I should take care of? I appreciate any help or advice on this topic.