Can anyone point me to a good example of using "iperf" with Mellanox Ethernet switch and HCAs?

Hi, I have a setup with a Mellanox SX1036 Ethernet switch and 2 servers, each with Mellanox ConnectX-2 HCA cards. Each HCA card has one of its (Ethernet) network ports connected to a port on the SX1036 switch, and are configured to 10 Gbps – links are up and running. I’d like to use “iperf” to do some network performance testing. So, I’ve installed iperf on both servers and am ready to go. I was hoping that someone could point me to good examples of using “iperf” with a similar setup, i.e. servers with Mellanox HCAs communicating through a Mellanox switch. Thanks!

Branko,

Thanks so much for the question. Let me see who I can get to jump on this for you.

Good question deserves a good answer:

i would start with recomending using MellanoxOFED2.0 http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers - the guys in the shop have been working very hard on performance improvements for ipoib and Ethernet as well.

you can also use the tuning guide http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters_v1.7.pdf and further tight things up.

here are few recommended steps for tuning and measuring with iperf/netperf:

  • Set IPoIB to run in datagram mode

echo datagram > /sys/class/net/ibX/mode

  • Set the HCA Port IRQ Affinity

  • Disable IRQ Balancer service (enabled by default on RH/OEL):

chkconfig irqbalance off

/etc/init.d/irqbalance stop

  • Use Mellanox script to distribute the IRQ vectors among the “close” cores.

For example:

/usr/sbin/set_irq_affinity_bynode.sh X mlx4-ib-

/usr/sbin/set_irq_affinity_bynode.sh X mlx4-comp

  • Where “X” is the node close to the HCA being tested:

cat /sys/class/net/ibN/device/numa_node

  • And “Y” is the port number (1st port is 1, 2nd port is 2).

  • Pin the application processes on the same node.

  • To get the list of the cores of the on node X, run:

cat /sys/devices/system/node/nodeX/cpulist

For example:

cat /sys/devices/system/node/node1/cpulist

8-15

  • Then pin the application using taskset utility, for example:

taskset –c 0,1,2,3,4,5,6,7 iperf –s

taskset –c 0,1,2,3,4,5,6,7 iperf –l 64k –P 8

Note that some applications provide command line flags for core pinning, for example:

netperf –T

  • IPv4 sysctl Modifications:

sysctl -w net.ipv4.tcp_timestamps=0

sysctl -w net.ipv4.tcp_sack=0

sysctl -w net.core.netdev_max_backlog=250000

sysctl -w net.core.rmem_max=16777216

sysctl -w net.core.wmem_max=16777216

sysctl -w net.core.rmem_default=16777216

sysctl -w net.core.wmem_default=16777216

sysctl -w net.core.optmem_max=16777216

sysctl -w net.ipv4.tcp_mem="16777216 16777216 16777216“

sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216“

sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216“

sysctl -w net.ipv4.tcp_low_latency=1

i hope it helps. good luck!

Thanks very much for your detailed response Yairi! Much appreciated and very useful information!