Low throughput on Mellanox Connectx-4 via VXLAN tunnel

Hello,

We are trying to setup 2 Linux servers with Mellanox ConectX-4 NICs, the servers are connected to each other via a VXLAN tunnel and we are having some issues with the throughput.

We did some tests with standard IP Forwarding and we managed to get bandwidth speeds up to 39Gbit/s, but when we test the throughput with the VXLAN tunnel the throughput we get is around 10Gbit/s.

CPU Usage with IP Forwarding is less than 5%, when we do the VXLAN test we see that a single core is utilized at 100% on the receiving server. The process consuming the CPU is “ksoftirqd/14”.

We noticed that VXLAN offload is enabled on both server, as per the documentation:

[root@frr-lab ~]# ls /sys/kernel/debug/mlx5/0000:05:00.0/VXLAN/

4789

[root@frr-lab2 ~]# ls /sys/kernel/debug/mlx5/0000:05:00.0/VXLAN/

4789

We also used “mlnx_tune -p HIGH_THROUGHPUT” on both of the servers. We also disabled irqdbalance and used the set_irq_affinity.sh script to bind multiple cores to the NIC.

Bellow you can find some more information regarding our servers:

OS: Fedora 28

Kernel: 4.16.3-301.fc28.x86_64

Mellanox OFED Driver: mlnx-en-4.5-1.0.1.0-fc28-x86_64

ConnectX-4 Firmware: 14.24.1000

System Resources (per server):

2 x Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz

128 GB of Memory

VXLAN is configured as per your documentation via a standard Linux Bridge.

Thank you very much for your time

Hi,

Could you please check your ethtool -k to see if all the features are enabled as required.

See https://community.mellanox.com/s/article/understanding-vxlan-hardware-stateless-offload-ethtool-parameters-for-connectx-4

Thanks

Marc