ConnectX-4 RX performance issues on DPDK

pawel.malachowski · January 25, 2018, 5:53pm

Are there any RX-side pps performance tips for ConnectX-4/PMD mlx5 family?

Our usecase requires optimising RX pps, I don’t care about TX. Adding more receiving lcores actually decreases RX performance.

After applying performance tips I am able to achieve 107M pps on TX side (no RX) using one 5-tuple or around 92M pps using 16 5-tuples for better RSS hashing.

However, I am not able to exceed 60Mpps on RX side in very specific case and around 18-37Mpps in more typical cases. (Performance is heavily affected by increasing number of queues above 4).

Running our DPDK application on 2x10G and 4x10G cards on different PMDs we have much more predictable performance scaling. I would rather expect that with 8 RX lcores I would be close to 100M RX pps.

Test setup details:

testpmd + dpdk-pktgen or dpdk-pktgen alone
DPDK 17.11
one 2x100G OEM card Mellanox Technologies MT27700 Family [ConnectX-4], mt4115, FW upgraded to 12.21
two ports connected to itself via copper MCP1600 1m
PCIe 3 16x slot, DevCtl MaxPayload 256 bytes, MaxReadReq 1024 bytes
E5-2650 v4 @ 2.20GHz CPU (12 cores), turbo disabled

I’m not expecting 148Mpps here, but according to performance results from http://fast.dpdk.org/doc/perf/DPDK_17_11_Mellanox_NIC_performance_report.pdf http://fast.dpdk.org/doc/perf/DPDK_17_11_Mellanox_NIC_performance_report.pdf , card should be able to do >90Mpps full duplex using single port.

I use two ports, one port for RX, one for TX, though.

Example commands:

./testpmd --file-prefix=820 --socket-mem=8192,8192 -l 12-23 -n 2 -w 0000:82:00.0,txq_inline=256 – --port-topology=chained --forward-mode=rxonly --rss-udp --rxq=2 --txq=2 --nb-cores=8 --socket-num=1 --stats-period=1 --burst=128 --rxd=2048 --txd=512

./testpmd --file-prefix=820 --socket-mem=8192,8192 -l 12-23 -n 2 -w 0000:82:00.0,txq_inline=256 – --port-topology=chained --forward-mode=rxonly --rss-udp --rxq=8 --txq=8 --nb-cores=8 --socket-num=1 --stats-period=1 --burst=128 --rxd=2048 --txd=512

./pktgen --file-prefix=both --socket-mem=28672,28672 -w 0000:82:00.0,txq_inline=256,txqs_min_inline=4 -w 0000:82:00.1,txq_inline=256,txqs_min_inline=4 -l 0-11,12-23 -n 4 – -P -N -T -m “[1:12-15].0, [16-23:1].1”

…

pawel.malachowski · January 31, 2018, 10:49am

I will respond to myself, hope somebody will find this useful.

moving traffic from card0:port0-card0/port1 to card0:port0-card1/port0 helped a lot
dpdk-pktgen requires some code tuning, more mbufs, larger burst etc.
dpdk-pktgen range traffic seems to be skewed sometimes/is not equally distributed by RX’s side RSS
I have better experience with testpmd in txmode; rxonly mode does not randomise IP addresses and flowgen mode is very slow
be careful as testpmd requires #rx cores = #tx cores (it silently uses MIN of these two numbers), in pktgen one can assing only 1 core for RX-doing-nothing which was better for txonly performance; ./pktgen --file-prefix=second --socket-mem=128,16384 -w 0000:82:00.0,txq_inline=128 -l 0,12-23 -n 2 – -N -T -m “[12:13-23].0”
all in all, I was able to reach
- around 85Mpps rxonly traffic using 8 cores (2.1GHz, turbo off) and probable can do a little more (spare cpu cycles) as it achieved 100% generator performance;
- up to 107Mpps txonly using 11 local NUMA cores + some borrowed remote NUMA cores

setprem · March 25, 2021, 7:03am

I am facing similar issue with ConnectX-5 (dual port 100G, pcie 4)

I am running pktgen in another server, connected to RX server via 100G DAC (only one 100G port is used for testing). pktgen is generating 25 Mpps.

But at RX server, it is receiving at a rate of 12-14 Mpps. I tried RSS and spread it to 4 RX Queues and dedicated lcore for reading from each RX Queue. But the collective RX capability still remains 12-14 Mpps. No matter how many more Queues i inrease, total RX rate remains the same as 12-14 Mpps.

Any help would be highly appreciated. RSS conf I used at receiver side is given below

.rx_adv_conf = {

.rss_conf = {

.rss_hf = ETH_RSS_IP | ETH_RSS_UDP |

ETH_RSS_TCP | ETH_RSS_SCTP,

}

},

Topic		Replies	Views
ConnectX-5 RX RSS has no impact even allocating more rx queues rx	2	976	March 29, 2021
ConnectX6DX - rte-flow / RSS performance drop on mixed traffic Ethernet Adapter Cards dpdk , flow-steering , rss-debugging	0	1420	May 31, 2022
ConnectX6 DPDK dpdk-testpmd Receive tcp ,udp Mixed flow performance is very low! Software And Drivers	2	960	January 31, 2022
ConnectX-6 DX - packet drop when enabling RSS / rxqueues Adapters and Cables dpdk , define , test_pmd , rss-debugging	4	1950	January 29, 2022
How to find the maximum number of RX Queues for a NIC (ConnectX-5)? Software And Drivers	4	4409	April 7, 2021
DPDK rte_flow is degrading performance when testing on Connect X5 100G EN @ 100G Ethernet Adapter Cards	6	1467	February 23, 2021
Line rate using Connect_X5 100G EN in Ubuntu; PCIe speed difference; Ethernet Adapter Cards	10	921	October 4, 2018
ConnectX-6 Dx NIC Performance Issue - rx_prio0_buf_discard Metric Increase Ethernet Adapter Cards performance , dpdk	11	2789	December 19, 2024
ConnectX6 DPDK dpdk-testpmd Receive error len error checksum UDP packet performance is very low! Software And Drivers	10	854	January 29, 2022
I'm experiencing a performance issue on ConnectX5-Ex cards (device ID 0x1019) in the form of a limit of the packet rate to around 6Mpps with production Internet traffic. Software And Drivers performance	2	853	January 31, 2022

ConnectX-4 RX performance issues on DPDK

Related topics