Poor performance with ConnectX5


I have two SuperMicro servers running RHEL each with one of these cards:

Device type: ConnectX5
Name: MCX516A-CCA_Ax
Description: ConnectX-5 EN network interface card; 100GbE dual-port QSFP28; PCIe3.0 x16; tall bracket; ROHS R6

Configured as LACP bond on the hosts with 40G optics:


ethtool bond0

Settings for bond0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 80000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Other
Transceiver: internal
Link detected: yes

iperf3 -i 5 -s
iperf3 -i 5 -t 60 -c beast.drcmr

[root@beauty ~]# iperf3 -i 5 -t 60 -c beast.drcmr
Connecting to host beast.drcmr, port 5201
[ 5] local port 36664 connected to port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-5.00 sec 2.04 GBytes 3.50 Gbits/sec 0 2.34 MBytes
[ 5] 5.00-10.00 sec 2.06 GBytes 3.55 Gbits/sec 0 3.11 MBytes
[ 5] 10.00-15.00 sec 2.06 GBytes 3.53 Gbits/sec 0 3.11 MBytes
[ 5] 15.00-20.00 sec 1.96 GBytes 3.37 Gbits/sec 0 3.11 MBytes
[ 5] 20.00-25.00 sec 1.96 GBytes 3.38 Gbits/sec 0 3.11 MBytes
[ 5] 25.00-30.00 sec 1.96 GBytes 3.37 Gbits/sec 0 3.11 MBytes
[ 5] 30.00-35.00 sec 1.96 GBytes 3.37 Gbits/sec 0 3.11 MBytes
[ 5] 35.00-40.00 sec 1.93 GBytes 3.32 Gbits/sec 0 3.11 MBytes
[ 5] 40.00-45.00 sec 2.01 GBytes 3.45 Gbits/sec 0 3.11 MBytes
[ 5] 45.00-50.00 sec 2.00 GBytes 3.44 Gbits/sec 0 3.11 MBytes
[ 5] 50.00-55.00 sec 1.98 GBytes 3.40 Gbits/sec 0 3.11 MBytes
[ 5] 55.00-60.00 sec 1.96 GBytes 3.37 Gbits/sec 0 3.11 MBytes

[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-60.00 sec 23.9 GBytes 3.42 Gbits/sec 0 sender
[ 5] 0.00-60.04 sec 23.9 GBytes 3.42 Gbits/sec receiver

Any suggestions as to why? Those numbers seem way off.




Hello Torkil,

Thank you for posting your inquiry to the NVIDIA Developer Forums.

We do not recommend using iperf3 for TCP benchmarking on Linux hosts.
iperf3 lacks several features that iperf2 contains, such as multithreading (and multicast test capabilities).
Multithreaded (parallel) testing (using multiple cores) are a much more realistic example of what you should expect for real-world throughput than single-stream, single-thread performance.

A quick example of iperf2 testing, using 8 cores, can be found here:

If you are still experiencing lower-than-expected throughput while using iperf2, we would recommend reviewing our comprehensive host tuning guide, available here:

General OS tuning guidelines can be found here, as well as Mellanox-specific tuning guidelines.
We also discuss the importance of NUMA-locality and provide instructions for pinning your applications to local CPU cores (https://enterprise-support.nvidia.com/s/article/understanding-numa-node-for-performance-benchmarks).

If after following these guidelines you are still not able to reach line rate (or near line rate), and you have valid Enterprise support entitlement, we would recommend engaging our Enterprise support team via the NVIDIA Enterprise support portal (https://enterprise-support.nvidia.com/s/create-case).

Thanks, and have a great day;
NVIDIA Enterprise Support


Thanks for the links. Iperf2 does indeed show resonable numbers for all but 2 hosts.

Can you also provide a link to the mlnx_tune script? I find it referenced a lot but with broken links.



Hi Torkil,

The mlnx_tune script is bundled with MLNX_OFED. This is our proprietary driver stack.

You can also find mlnx_tune on the Mellanox userland tools and scripts GitHub:

(It’s within the Python directory)

NVIDIA Enterprise Support

what’s the windows version of mlx_tune?
i’m seeing 56gb/s on a 100gbe adapter with iperf2

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.