Packet loss with multi-frame payloads

system · July 17, 2017, 7:26pm

Hello,

I am having a problem with packets loss in my DPDK application and I hope you can help me out. Below you find a description of the application and of the problem.

It is a little long, but I really hope somebody out there can help me, because this is driving me crazy.

Application

I have a client-server application; single server, multiple clients.

The machines have 8 active cores which poll 8 distinct RX queues to receive packets and use 8 distinct TX queues to burst out packets (i.e., run-to-completion model).

Workload

The workload is composed of mostly single-frame packets, but occasionally clients send to the server multi-frame packets, and occasionally the server sends back to the client multi-frame replies.

Packets are fragmented at the UDP level (i.e., no IP fragmentation, every packet of the same requests has a frag_id == 0, even though they share the same packet_id).

Problem

I experience huge packet loss on the server when the occasional multi-frame requests of the clients correspond to a big payload ( > 300 Kb).

The eth stats that I gather on the server say that there is no error, nor any packet loss (q_errors, imissed, ierrors, oerrors, rx_nombuf are all equal to 0). Yet, the application is not seeing some packets of big requests that the clients send.

I record some interesting facts

The clients do not experience such packet loss, although they also receive packets with an aggregate payload of the same size of the packets received by the server. The only differences w.r.t. the server is that a client machine of course has a lower RX load (it only gets the replies to its own requests) and a client thread only receives packets from a single machine (the server).
This behavior does not arise as long as the biggest payload exchanged between clients and servers is < 200 Kb. This leads me to conclude that fragmentation is not te issue (also, if I implement a stubborn retransmission, eventually all packets are received even with bigger payloads). Also, I reserve plenty of memory for my mempool, so I don’t think the server runs out of mbufs (and if that was the case I guess I would see this in the dropped packets count, right?).
If I switch to the pipeline model (on the server only) this problem basically disappears. By pipeline model I mean something like the load-balancing app, where a single core on the server receives client packets on a single RX queue (worker cores reply back to the client using their own TX queue). This leads me to think that the problem is on the server, and not on the clients.
It doesn’t seem to be a “load” problem. If I run the same tests multiple times, in some “lucky” runs I get that the run-to-completion model outperforms the pipeline one. Also, the run-to-completion model with single-frame packets can handle a number of single-frame packets per second that is much higher than the number of frames per second that are generated with the workload with some big packets.

Question

Do you have any idea why I am witnessing this behavior? I know that having fewer queues can help performance by relieving contention on the NIC, but is it possible that the contention is actually causing packets to get dropped?

Platform

DPDK: v 2.2-0 (I know this is an old version, but I am dealing with legacy code I cannot change)

MLNX_OFED_LINUX-3.1-1.0.3-ubuntu14.04-x86_64

My NIC : Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

My machine runs a 4.4.0-72-generic on Ubuntu 16.04.02

CPU is Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz 2x8 cores

Thank you a lot, especially if you went through the whole message

Regards,

Harold

MvB · August 10, 2017, 1:27am

Hi Harold,

Thank you for posting this question on our Community and thank you for your patience in this matter.

We reviewed your information with our developers. Our recommendation (because you cannot upgrade to the latest DPDK version) is to update to Mellanox DPDK version 2.2_4.2. This version delivers enhancements for Mellanox NICs and bug fixes on top of DPDK 2.2-0.

You can download Mellanox DPDK version 2.2_4.2 through the following link http://www.mellanox.com/downloads/Drivers/MLNX_DPDK_2.2_4.2.tar.gz http://www.mellanox.com/downloads/Drivers/MLNX_DPDK_2.2_4.2.tar.gz

The Quick Start ( http://www.mellanox.com/related-docs/prod_software/MLNX_DPDK_Quick_Start_Guide_v2.2_4.2.pdf http://www.mellanox.com/related-docs/prod_software/MLNX_DPDK_Quick_Start_Guide_v2.2_4.2.pdf ) Guide provides all the information to install Mellanox DPDK as well as tuning the system for Performance in section 3.0

The Release Notes ( http://www.mellanox.com/related-docs/prod_software/MLNX_DPDK_Release_Notes_v2.2_4.2.pdf http://www.mellanox.com/related-docs/prod_software/MLNX_DPDK_Release_Notes_v2.2_4.2.pdf ) will provide with the system requirements regarding the use of Mellanox DPDK

If this does not resolve your issue, we advise you to open a support case with Mellanox Support.

Thanks.

Cheers,

~Martijn

Topic		Replies	Views
Azure + Mellanox (DPDK 19.11.11, 21.11.0, 22.03.0) - loosing packets after minutes to hours of fine operation Virtualization For Infiniband And Ethernet dpdk , net , cloud	1	1799	May 17, 2022
Experiencing low performance on Mellanox ConnectX-6DX Mellanox OFED boot , kernel , offload-features	2	827	April 1, 2024
rx-out-of-buffer Ethernet Adapter Cards	10	3693	May 6, 2019
DPDK rte_flow is degrading performance when testing on Connect X5 100G EN @ 100G Ethernet Adapter Cards	6	1376	February 23, 2021
Performance Test finding bottleneck and optimization Network Management Products dpdk , mellanox-ofed	2	1749	March 17, 2022
How to find the maximum number of RX Queues for a NIC (ConnectX-5)? Software And Drivers	4	4034	April 7, 2021
ConnectX-6 Dx NIC Performance Issue - rx_prio0_buf_discard Metric Increase Ethernet Adapter Cards performance , dpdk	11	2504	December 19, 2024
dpdk and connectx3 problem Application Accelerator Software	8	877	December 6, 2018
ConnectX6 DPDK dpdk-testpmd Receive error len error checksum UDP packet performance is very low! Software And Drivers	10	824	January 29, 2022
ConnectX-6 DX - packet drop when enabling RSS / rxqueues Adapters and Cables dpdk , define , test_pmd , rss-debugging	4	1882	January 29, 2022

Packet loss with multi-frame payloads

Related topics