I have the Mellanox ConnectX-3 VPI IB/Ethernet (MCX354A-FCBT-A4 PCIe Gen3 x8), but I cannot get 40Gbps on Ethernet mode. No matter what I tried so far, I cannot exceed ~23Gbps when running iperf.
My test setup is as follows:
2 identical HP ProLiant DL360p Gen8 servers equipped with 2 Quad Core Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz CPUs and 32GB RAM. The OS is Ubuntu Linux 12.04.5 running the Kernel 3.2.0-70-generic and Mellanox OFED 2.3-1.0.1. In the BIOS settings, everything is set to the maximum performance.
The ConnectX-3 cards are connected back to back (no switch) with a Mellanox FDR copper cable (1m long).
The mlx4_core module is probed with the following options:
# The following two lines are added by the driver by default:
options mlx4_core fast_drop=1
options mlx4_core log_num_mgm_entry_size=-1
# The following line is added by me:
options mlx4_core num_vfs=16 port_type_array=2,2 probe_vf=0 enable_sys_tune=1
I have read the Performance Tuning Guidelines for Mellanox Network Adapters and I have made countless combinations of the suggested tunning parameters, but I cannot reach 40Gbps.
Here is a list of the typical commands I run in both servers for eth6 which is the port I use for my experiments:
~# ibdev2netdev
mlx4_0 port 1 ==> eth6 (Up)
mlx4_0 port 2 ==> eth7 (Down)
sysctl -w net.ipv4.tcp_timestamps=0
sysctl -w net.ipv4.tcp_sack=1
sysctl -w net.core.netdev_max_backlog=250000
sysctl -w net.core.rmem_max=4194304
sysctl -w net.core.wmem_max=4194304
sysctl -w net.core.rmem_default=4194304
sysctl -w net.core.wmem_default=4194304
sysctl -w net.core.optmem_max=4194304
sysctl -w net.ipv4.tcp_rmem=“4096 87380 4194304”
sysctl -w net.ipv4.tcp_wmem=“4096 65536 4194304”
sysctl -w net.ipv4.tcp_low_latency=1
sysctl -w net.ipv4.tcp_adv_win_scale=1
ethtool -K eth6 lro on
service irqbalance stop
NUMA_NODE=$(cat /sys/class/net/eth6/device/numa_node)
set_irq_affinity_bynode.sh $NUMA_NODE eth6
cat /sys/devices/system/node/node${NUMA_NODE}/cpulist
TASKSET_VAR=$(cat /sys/devices/system/node/node${NUMA_NODE}/cpumap | rev | cut -f 1 -d, | rev)
And at last, I run iperf like this:
taskset $TASKSET_VAR iperf -s # On server side (40.40.40.8)
taskset $TASKSET_VAR iperf -c40.40.40.8 -t43200 -i2 -P4 # On client side (40.40.40.7)
Some extra strange behavior I observed, is that when I try to use jumbo frames (MTU 9000 or 9600) I get worse performance compared to the default MTU setting (1500).
Any suggestion on what else shall I look for?