What is the correct driver for ConnectX-4 LX and ConnectX-6 LX Cards?

Hello, what are the correct drivers for my servers below? The Nvidia Mellanox driver section confused me a lot. The only reason to install these cards in our servers is to reach 1Gb/s speeds between servers, what is the correct driver and optimization for this?

  • Redhat Inbox Driver is available on servers 1,2,3,4.
  • Debian Inbox Driver is available on server 5.
  • I can’t get over 600 Megabytes/S between servers.
  • At least on server 1 there should definitely be a way for me to get up to 1GbE/s speeds.
  • These tests were done with Inbox drivers.
  • I used scp, sftp and file explorer to copy files between servers.
  • End-to-End Jumbo Packet Active on the Switch to which the servers are connected.
  • On the servers, the MTU is currently default 1500, I used to set it to 8000 4200 from time to time, but the speed hovers between 400 and 590 megabyte/s.
  • MLNX_OFED MLNX_EN Which driver should it be, there are also different versions in these drivers.

I need your help and guidance on this issue.

Thank you for your time and support.

Server 1:
Model : Power Edge 7625
Operating System : Redhat 8.8
Ram : 512GB Ram
Processor : AMD EPYC 4th Generation 9654 X 2
Video Card : Nvidia A16 64 GB Ram
Etherner Card : Nvidia Mellanox ConnectX-6 LX 10/25GbE
a1:00.0 Ethernet controller: Mellanox Technologies MT2894 Family [ConnectX-6 Lx]
fw_ver: 26.36.1010
board_id: DEL0000000031
Disk Structure :
Root-Raid 1 : NVME SSD
Home-Raid 1: NVME SSD

**server 1 Root Disk Benchmark**

[root@server1 temp]# dd if=/dev/nvme0n1 of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB, 3.8 GiB) copied, 1.20203 s, 3.4 GB/s

**server 1 home disk benchmark**

[root@server1 temp]# dd if=/dev/sda of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB, 3.8 GiB) copied, 0.580067 s, 7.1 GB/s

Server 2:
Model : Power Edge R630
Operating System : Redhat 7.9
Ram : 64GB
Processor : intel Xeon E5-2620 (24 Thread) * 2
Ethernet Card : ConnectX-4 LX 10GbE
Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
fw_ver: 14.32.1010
board_id: MT_2420110004
Disk structure :
Root - Raid 1 : SSD
Home - Raid 1 : HDD

server2 root disk benchmark 
[root@server2 ~]# dd if=/dev/sda of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 2.34913 s, 1.7 GB/s

server2 home disk benchmark 
[root@server2 ~]# dd if=/dev/sdb of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 11.7919 s, 347 MB/s

Server 3:
Model : Power Edge 640
Operating System : Redhat 7.9
Ram : 256GB
Processor : intel Gold 5118 (48 Thread) *2
Ethernet Card : ConnectX-4 LX 10GbE
Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
fw_ver: 14.32.1010
board_id: MT_2420110004
Disk Structure :
Root : Raid 1 : SSD
Home - Raid 1 : HDD

server3 root disk benchmark 
[root@server3 ~]# dd if=/dev/sda of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 3.65059 s, 1.1 GB/s

server3 home disk benchmark 
[root@tutelserver3 ~]# dd if=/dev/sdb of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 8.6707 s, 472 MB/s```

Server 4:
Model : Power Edge R630
Operating System : Redhat 7.9
Ram : 64GB
Processor : intel Xeon E5-2670 (48 Thread) *2
Ethernet Card : ConnectX-4 LX 10GbE
Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
fw_ver: 14.32.1010
board_id: MT_2420110004
Root : Raid 1 : SSD
Home - Raid 1 : HDD

**server4 ROOT disk benchmark**

[root@server4 ~]# dd if=/dev/sda of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 4.0967 s, 1000 MB/s

**server4 home disk benchmark**

[root@server4 ~]# dd if=/dev/sdb of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB) copied, 10.1786 s, 402 MB/s

Server 5:
Operating System : Debian 11.0 (OpenMediaVault)
Processor : intel i7 7700
Ram : 8 GB
Root : Raid 1 : SSD
Home - Raid 1 : HDD
Ethernet Card : ConnectX-4 LX 10GbE
Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
fw_ver: 14.32.1010
board_id: MT_2420110004

**server5 ROOT disk benchmark**

root@server5~# dd if=/dev/nvme0n1 of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB, 3.8 GiB) copied, 3.07624 s, 1.3 GB/s

**server5 HOME disk benchmark**

root@server5:~# dd if=/dev/sda of=/dev/null bs=4k count=1000000 oflag=dsync
1000000+0 records in
1000000+0 records out
4096000000 bytes (4.1 GB, 3.8 GiB) copied, 15.3024 s, 268 MB/s

scp, sftp and file explorer to copy files between servers maybe can’t burst the performance.

For isolate the issue, we can test RDMA first. ex:
Server: ib_write_bw -d mlx5_0 -i 1 -x 3 --report_gbits -F -n 5000 -q 2 -m 4096 -s 65536 --run_infinitely
Client: ib_write_bw -d mlx5_0 -i 1 -x 3 --report_gbits -F -n 5000 -q 2 -m 4096 -s 65536 --run_infinitely <server_ip>

If the bandwidth can reach to 90% of NIC bandwidth, the try to use iperf to test TCP:
ex:
server: iperf -s
client: iperf -c <server_ip> -t 60 -P 5

Best Regards,
Levei

I get the following error when I want to process on the client side. How can I solve this problem?

Port number 1 state is Down
Couldn’t set the link layer
Couldn’t get context for the device

Actually with iperf3 I get the following results, my problem is that the copy speed between the two servers is slow. How can I find the reason for this?

[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1.10 GBytes 9.44 Gbits/sec 0 991 KBytes
[ 5] 1.00-2.00 sec 1.09 GBytes 9.41 Gbits/sec 0 991 KBytes
[ 5] 2.00-3.00 sec 1.10 GBytes 9.42 Gbits/sec 0 1.03 MBytes
[ 5] 3.00-4.00 sec 1.10 GBytes 9.42 Gbits/sec 0 1.03 MBytes
[ 5] 4.00-5.00 sec 1.10 GBytes 9.42 Gbits/sec 0 1.03 MBytes
[ 5] 5.00-6.00 sec 1.10 GBytes 9.42 Gbits/sec 0 1.03 MBytes
[5] 6.00-7.00 sec 1.09 GBytes 9.41 Gbits/sec 0 1.03 MBytes
[5] 7.00-8.00 sec 1.10 GBytes 9.42 Gbits/sec 0 1.03 MBytes
[5] 8.00-9.00 sec 1.10 GBytes 9.42 Gbits/sec 0 1.03 MBytes
[ 5] 9.00-10.00 sec 1.10 GBytes 9.42 Gbits/sec 0 1.03 MBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 11.0 GBytes 9.42 Gbits/sec 0 sender
[ 5] 0.00-10.04 sec 11.0 GBytes 9.38 Gbits/sec receiver

Thank you for your help. Just like you said, sftp file explorer is somehow interrupting performance. But with nfs I was able to reach speeds of 900/s. Thanks for this valuable information. Our users use sftp scp because it is easy, I wonder if there is a performance way other than nfs. I need to investigate it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.