Mellanox ConnectX-4 VPI in 100GbE ethernet mode cannot perform beyond ~52Gbps

Hello all.

I’ve got 2 * ConnectX-4 VPI cards in ethernet mode. One card is in a brand new dual socket Intel Xeon E5v4 host and the other is in a still fairly new dual socket Xeon E5v3 host. Both hosts run current patched CentOS 7.3 with new OFED 4.x code.

Only one port is being used on each card. They are connected via MLNX branded 100G cables into a Juniper QFX5200 100G switch.

The links one each switch on the ports show no errors or transmission issues and the links look “clean”. The links are “running” at 100G.

Using iPerf3 or FDT, even with significant NIC tuning, I only see around 52Gbps of throughput, even in multi-threaded transfer tests:

10/03 19:57:17 Net Out: 52.500 Gb/s Avg: 52.456 Gb/s

10/03 19:57:22 Net Out: 52.534 Gb/s Avg: 52.459 Gb/s

10/03 19:57:27 Net Out: 52.393 Gb/s Avg: 52.456 Gb/s

10/03 19:57:32 Net Out: 52.682 Gb/s Avg: 52.465 Gb/s

10/03 19:57:37 Net Out: 52.598 Gb/s Avg: 52.470 Gb/s

10/03 19:57:42 Net Out: 52.333 Gb/s Avg: 52.464 Gb/s

10/03 19:57:47 Net Out: 52.494 Gb/s Avg: 52.465 Gb/s

10/03 19:57:52 Net Out: 52.588 Gb/s Avg: 52.469 Gb/s

10/03 19:57:57 Net Out: 52.576 Gb/s Avg: 52.473 Gb/s

10/03 19:58:02 Net Out: 52.406 Gb/s Avg: 52.470 Gb/s

10/03 19:58:07 Net Out: 52.489 Gb/s Avg: 52.471 Gb/s

10/03 19:58:12 Net Out: 52.521 Gb/s Avg: 52.472 Gb/s

10/03 19:58:17 Net Out: 52.463 Gb/s Avg: 52.472 Gb/s

10/03 19:58:22 Net Out: 52.531 Gb/s Avg: 52.473 Gb/s

10/03 19:58:27 Net Out: 52.604 Gb/s Avg: 52.477 Gb/s

10/03 19:58:32 Net Out: 52.385 Gb/s Avg: 52.474 Gb/s

I’ve tried NUMA-CTL pinning cores, making sure the performance mode was set in CPU governors, ring tuning the TCP/IP buffers, txqueuelen tweaks etc.

Just wondering what I might be missing?

Thank you.

z

Hi Zebra,

  1. Have you validated that the HCA FW aligned with Mellanox OFED Driver? Please consult with the RN of the driver.

  2. Check the PCIe Gen & width (check the " LnkCap" & LnkSta") : “PCIe generation 3.0 and x8 or x16”?

#lspci -v | grep -i mel

#lspci -s “domain.bus.slot” -vvv (IE: 04:00.0)

  1. In regard to tuning, have you used our Performance Tuning Guide for our Network Adapter Card?

http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters.pdf

  1. Are the results similar with iperf version 2.x from both client and server side using a parallelism of 4?

(make sure you are using “taskset” using the CPU’s closer to the NUMA node the HCA card is installed).

  1. Verify offloads (ethtool -k )

generic-segmentation-offload: on

generic-receive-offload: on

large-receive-offload: on

  1. In some Linux distributions, Hardware LRO (HW LRO) must be enabled to reach the

required line-rate performance.

To enabled HW LRO:

ethtool -–set-priv-flags hw_lro on ( default off)

  1. In case “tx-nocache-copy” is enabled, (this is the case for some kernels, e.g. kernel 3.10,

which is the default for RH7.0) “tx-nocache-copy” should be disabled.

To disable “tx-nocache-copy”:

ethtool -K tx-nocache-copy off

  1. Our Performance Tuning Guide contains these information + other recommended tuning (IE: power management, NUMA Architecture tuning, Interrupt Moderation Tuning).

  2. You can also use our “mlnx_tune” utility for automatic tuning and compare. (mlnx_tune --help).

Sophie.