I’ve set up a test system with two Dell R730 servers, each of them with a ConnectX-3 Pro NIC card and connected by a 40GE cable. I’ve followed the BIOS tuning for R730 ( https://community.mellanox.com/s/article/bios-performance-tuning-example-for-dell-poweredge-r730 ) and the VMA Performance Tuning Guide ( https://community.mellanox.com/s/article/vma-performance-tuning-guide ) very carefully and made sure I understood everything I did. Then I run the VMA latency test with:
sudo LD_PRELOAD=libvma.so VMA_SPEC=latency numactl --cpunodebind=1 taskset -c 33 sockperf sr -i 192.168.48.2
and
sudo LD_PRELOAD=libvma.so VMA_SPEC=latency numactl --cpunodebind=1 taskset -c 33 sockperf pp -i 192.168.48.2 -t 10
I’ve checked that the NIC cards are in the right slot with 16x PCIE width and are in NUMA node #1. However, the test gives me a surprisingly high MAXIMUM latency of 160us while the average is only 1us:
Test Result of UDP ping-pong with VMA
sockperf: —> observation = 162.336
sockperf: —> percentile 99.999 = 6.488
sockperf: —> percentile 99.990 = 4.949
sockperf: —> percentile 99.900 = 2.099
sockperf: —> percentile 99.000 = 1.705
sockperf: —> percentile 90.000 = 1.409
sockperf: —> percentile 75.000 = 1.356
sockperf: —> percentile 50.000 = 1.179
sockperf: —> percentile 25.000 = 1.135
sockperf: —> observation = 1.075
So I was wondering what could be the cause of this very high worst case latency, and what could be done to reduce it. I’ve also done another test without VMA. While it gives me a higher average latency of 6us the worst case latency is not so bad:
Test Result of UDP ping-pong without VMA
sockperf: —> observation = 21.201
sockperf: —> percentile 99.999 = 9.604
sockperf: —> percentile 99.990 = 8.219
sockperf: —> percentile 99.900 = 7.626
sockperf: —> percentile 99.000 = 6.796
sockperf: —> percentile 90.000 = 6.318
sockperf: —> percentile 75.000 = 6.147
sockperf: —> percentile 50.000 = 5.937
sockperf: —> percentile 25.000 = 5.848
sockperf: —> observation = 5.561
So is this because of the VMA itself or there’re something else to suspect? The OS I am using is Ubuntu 14.04 with low-latency kernel 3.17.
Any advice is appreciated!
Regards,
Hongyuan