ConnectX-3 Pro VXLAN Performance Overhead

system · June 13, 2014, 9:06pm

Hi,

I’m testing out ConnectX-3 Pro with VXLAN Offload in our lab. Using a single-stream iperf performance test, we get ~34Gbit/s transfer speed of non-VXLAN transport, but only ~28Gbit/s with VXLAN encapsulation.

In both cases, the bottleneck is the CPU on the receiving side. Looking at a perf dump, the top usage:

Without VXLAN:

24.27% iperf [kernel.kallsyms] [k] copy_user_enhanced_fast_string
6.49% iperf [kernel.kallsyms] [k] mlx4_en_process_rx_cq
5.34% iperf [kernel.kallsyms] [k] tcp_gro_receive
3.43% iperf [kernel.kallsyms] [k] dev_gro_receive
3.28% iperf [kernel.kallsyms] [k] mlx4_en_complete_rx_desc
3.05% iperf [kernel.kallsyms] [k] memcpy
2.88% iperf [kernel.kallsyms] [k] inet_gro_receive

With VXLAN:

20.06% iperf [kernel.kallsyms] [k] copy_user_enhanced_fast_string
6.04% iperf [kernel.kallsyms] [k] mlx4_en_process_rx_cq
5.43% iperf [kernel.kallsyms] [k] inet_gro_receive
3.29% iperf [kernel.kallsyms] [k] dev_gro_receive
3.24% iperf [kernel.kallsyms] [k] tcp_gro_receive
3.08% iperf [kernel.kallsyms] [k] skb_gro_receive
3.02% iperf [kernel.kallsyms] [k] memcpy
2.85% iperf [kernel.kallsyms] [k] mlx4_en_complete_rx_desc

This is Centos 6.5, kernel 3.15.0, Firmware 2.31.5050.

We’re certainly happy with 28Gbit/s, but I’m wondering if there are plans to improve this to the point that VXLAN adds no additional CPU overhead at all, or if there is any tuning I can do towards the same goal?

Thorvald

ophirm · June 15, 2014, 12:02pm

About PlumGrid:

PlumGrid and Mellanox published a new white paper about creating a better network infrastructure for a large-scale OpenStack cloud by using Mellanox’s ConnectX-3 Pro VXLAN HW offload.

The PlumGrid VNI (Virtual Network Infrastructure) running over Mellanox switches and ConnectX-3 Pro adapters is a unique offering targeted for large-scale data centers.

With the ConnectX-3 Pro stateless HW offload, users can achieve:

Linear improvement in VM performance until reaching the near line rate performance (36 Gbps with eight VM pairs generating traffic at maximum rates).
CPU utilization remains virtually constant on both TX and RX ends, while the throughput grows to 36 Gbps.

The white paper is available from Plumgrid website page: http://www.plumgrid.com/wp-content/uploads/documents/PLUMgrid_Mellanox_WP.pdf http://www.plumgrid.com/wp-content/uploads/documents/PLUMgrid_Mellanox_WP.pdf

PlumGrid VNI 3.0 is a software networking product for large-scale OpenStack Clouds, it provides a network fabric-agnostic, turn-key solution to build a scalable cloud infrastructure and offer advanced, on-demand network services to cloud tenants. To find out more http://www.plumgrid.com/product/overview/ http://www.plumgrid.com/product/overview/

rdarbha · June 19, 2014, 4:23pm

#! /bin/bash

set -x

DEV=mlx4

NET=21

ip addr flush dev mlx4

ip link set dev mlx4 down

ip link del vxlan0

ip link set dev $DEV mtu 9000

ip addr add 10.224.$NET.27/24 brd + dev $DEV

ip link set dev $DEV up

ip route add 10.224.0.0/12 via 10.224.$NET.1

ip link add vxlan0 type vxlan id 17 group 239.1.1.17 dev $DEV

ip addr add 172.18.1.$NET/24 brd + dev vxlan0

ip link set dev vxlan0 up

This is run on both machines (with different NET variable), bare metal with no VM. mlx4 is the ethX device renamed.

MTU 9000 is a new addition; with that I get ~38 Gbit/s when doing single-stream TCP testing on the mlx4 device, but VXLAN encapsulated traffic stays at ~24Gbit/s; CPU bound on a single core.

The performance I am seeing is close to the one you show in DOC-1456 for 1 VM pair. While I can get high performance by running multiple streams, I could get similar aggregate performance by bonding 4 10 Gbit/s connections. I’m really hoping to improve our single-stream speeds.

ophirm · June 15, 2014, 7:36am

Hi Thorvald,

Did you run this test VM to VM or within the hypervisor, I assume VM to VM.

Is this only one flow (one VM) or more (several VMs on the same host)?

What is the CPU that you are using? number of cores? memory?

Do you use PCIe Gen3? (I assume you do)

Do you use MTU=1500?

If possible, try to run 2 or 4 VMs and see how it goes, it should be better.

The performance looks ok, but you could reach to better ones (close to line rate)

See this post:Infrastructure & Networking - NVIDIA Developer Forums https://community.mellanox.com/s/article/vxlan-considerations-for-connectx-3-pro

I added a performance slide, and a link to Case Study with Plumgrid

http://www.plumgrid.com/wp-content/uploads/documents/PLUMgrid_Mellanox_WP.pdf http://www.plumgrid.com/wp-content/uploads/documents/PLUMgrid_Mellanox_WP.pdf

Thanks,

Ophir.

Topic		Replies	Views
Low throughput on Mellanox Connectx-4 via VXLAN tunnel Adapters and Cables	1	397	May 26, 2019
ConnectX-3 Pro VPI/EN Receive VXLAN Offloading Ethernet Adapter Cards	0	247	June 29, 2017
ConnectX-3 Pro VPI/EN Receive VXLAN Offloading InfiniBand/VPI Adapter Cards	1	304	October 23, 2017
Cannot get 40Gbps on Ethernet mode with ConnectX-3 VPI Ethernet Adapter Cards	3	516	November 17, 2014
I have the performance issue of ConnectX-4 Lx on ubuntu 20.04LTS, kernel 5.4.0-84-generic. Could you please help us for fix the issue?. Software And Drivers	2	700	October 14, 2021
Poor performance with ConnectX5 Ethernet Adapter Cards	5	1283	January 16, 2023
ConnectX-3 and ConnectX-3 Pro Firmware release 2.36.5000 is now available solutions	2	667	February 8, 2016
ConnectX-4 Lx very slow on Windows 10. Same speed as an Intel 1GB using RJ45. Adapters and Cables	7	1479	January 26, 2021
ConnectX6 DPDK dpdk-testpmd Receive tcp ,udp Mixed flow performance is very low! Software And Drivers	2	875	January 31, 2022
ConnectX-3 Pro connecting at 10g instead of 40g Ethernet Adapter Cards	5	1427	September 11, 2017

ConnectX-3 Pro VXLAN Performance Overhead

Related topics