Can anyone point me to a good example of using "iperf" with Mellanox Ethernet switch and HCAs?

Hi, I have a setup with a Mellanox SX1036 Ethernet switch and 2 servers, each with Mellanox ConnectX-2 HCA cards. Each HCA card has one of its (Ethernet) network ports connected to a port on the SX1036 switch, and are configured to 10 Gbps – links are up and running. I’d like to use “iperf” to do some network performance testing. So, I’ve installed iperf on both servers and am ready to go. I was hoping that someone could point me to good examples of using “iperf” with a similar setup, i.e. servers with Mellanox HCAs communicating through a Mellanox switch. Thanks!


Thanks so much for the question. Let me see who I can get to jump on this for you.

Good question deserves a good answer:

i would start with recomending using MellanoxOFED2.0 - the guys in the shop have been working very hard on performance improvements for ipoib and Ethernet as well.

you can also use the tuning guide and further tight things up.

here are few recommended steps for tuning and measuring with iperf/netperf:

  • Set IPoIB to run in datagram mode

echo datagram > /sys/class/net/ibX/mode

  • Set the HCA Port IRQ Affinity

  • Disable IRQ Balancer service (enabled by default on RH/OEL):

chkconfig irqbalance off

/etc/init.d/irqbalance stop

  • Use Mellanox script to distribute the IRQ vectors among the “close” cores.

For example:

/usr/sbin/ X mlx4-ib-

/usr/sbin/ X mlx4-comp

  • Where “X” is the node close to the HCA being tested:

cat /sys/class/net/ibN/device/numa_node

  • And “Y” is the port number (1st port is 1, 2nd port is 2).

  • Pin the application processes on the same node.

  • To get the list of the cores of the on node X, run:

cat /sys/devices/system/node/nodeX/cpulist

For example:

cat /sys/devices/system/node/node1/cpulist


  • Then pin the application using taskset utility, for example:

taskset –c 0,1,2,3,4,5,6,7 iperf –s

taskset –c 0,1,2,3,4,5,6,7 iperf –l 64k –P 8

Note that some applications provide command line flags for core pinning, for example:

netperf –T

  • IPv4 sysctl Modifications:

sysctl -w net.ipv4.tcp_timestamps=0

sysctl -w net.ipv4.tcp_sack=0

sysctl -w net.core.netdev_max_backlog=250000

sysctl -w net.core.rmem_max=16777216

sysctl -w net.core.wmem_max=16777216

sysctl -w net.core.rmem_default=16777216

sysctl -w net.core.wmem_default=16777216

sysctl -w net.core.optmem_max=16777216

sysctl -w net.ipv4.tcp_mem="16777216 16777216 16777216“

sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216“

sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216“

sysctl -w net.ipv4.tcp_low_latency=1

i hope it helps. good luck!

Thanks very much for your detailed response Yairi! Much appreciated and very useful information!