Why the very high MAXIMUM latency in UDP ping-pong test?

I’ve set up a test system with two Dell R730 servers, each of them with a ConnectX-3 Pro NIC card and connected by a 40GE cable. I’ve followed the BIOS tuning for R730 ( https://community.mellanox.com/s/article/bios-performance-tuning-example-for-dell-poweredge-r730 ) and the VMA Performance Tuning Guide ( https://community.mellanox.com/s/article/vma-performance-tuning-guide ) very carefully and made sure I understood everything I did. Then I run the VMA latency test with:

sudo LD_PRELOAD=libvma.so VMA_SPEC=latency numactl --cpunodebind=1 taskset -c 33 sockperf sr -i 192.168.48.2

and

sudo LD_PRELOAD=libvma.so VMA_SPEC=latency numactl --cpunodebind=1 taskset -c 33 sockperf pp -i 192.168.48.2 -t 10

I’ve checked that the NIC cards are in the right slot with 16x PCIE width and are in NUMA node #1. However, the test gives me a surprisingly high MAXIMUM latency of 160us while the average is only 1us:

Test Result of UDP ping-pong with VMA

sockperf: —> observation = 162.336

sockperf: —> percentile 99.999 = 6.488

sockperf: —> percentile 99.990 = 4.949

sockperf: —> percentile 99.900 = 2.099

sockperf: —> percentile 99.000 = 1.705

sockperf: —> percentile 90.000 = 1.409

sockperf: —> percentile 75.000 = 1.356

sockperf: —> percentile 50.000 = 1.179

sockperf: —> percentile 25.000 = 1.135

sockperf: —> observation = 1.075

So I was wondering what could be the cause of this very high worst case latency, and what could be done to reduce it. I’ve also done another test without VMA. While it gives me a higher average latency of 6us the worst case latency is not so bad:

Test Result of UDP ping-pong without VMA

sockperf: —> observation = 21.201

sockperf: —> percentile 99.999 = 9.604

sockperf: —> percentile 99.990 = 8.219

sockperf: —> percentile 99.900 = 7.626

sockperf: —> percentile 99.000 = 6.796

sockperf: —> percentile 90.000 = 6.318

sockperf: —> percentile 75.000 = 6.147

sockperf: —> percentile 50.000 = 5.937

sockperf: —> percentile 25.000 = 5.848

sockperf: —> observation = 5.561

So is this because of the VMA itself or there’re something else to suspect? The OS I am using is Ubuntu 14.04 with low-latency kernel 3.17.

Any advice is appreciated!

Regards,

Hongyuan

It may happen because of the some kind of warmup when sending first packets and you can see something similar happens when not using VMA. What you really interested in is the average latency that is not shown in your output, but is it much smaller when VMA is not used.

For better understanding, you might patch sockperf and VMA code and see when and where this latency is higher and finally find where time is spent.