Are there any RX-side pps performance tips for ConnectX-4/PMD mlx5 family?
Our usecase requires optimising RX pps, I don’t care about TX. Adding more receiving lcores actually decreases RX performance.
After applying performance tips I am able to achieve 107M pps on TX side (no RX) using one 5-tuple or around 92M pps using 16 5-tuples for better RSS hashing.
However, I am not able to exceed 60Mpps on RX side in very specific case and around 18-37Mpps in more typical cases. (Performance is heavily affected by increasing number of queues above 4).
Running our DPDK application on 2x10G and 4x10G cards on different PMDs we have much more predictable performance scaling. I would rather expect that with 8 RX lcores I would be close to 100M RX pps.
Test setup details:
- testpmd + dpdk-pktgen or dpdk-pktgen alone
- DPDK 17.11
- one 2x100G OEM card Mellanox Technologies MT27700 Family [ConnectX-4], mt4115, FW upgraded to 12.21
- two ports connected to itself via copper MCP1600 1m
- PCIe 3 16x slot, DevCtl MaxPayload 256 bytes, MaxReadReq 1024 bytes
- E5-2650 v4 @ 2.20GHz CPU (12 cores), turbo disabled
I’m not expecting 148Mpps here, but according to performance results from http://fast.dpdk.org/doc/perf/DPDK_17_11_Mellanox_NIC_performance_report.pdf http://fast.dpdk.org/doc/perf/DPDK_17_11_Mellanox_NIC_performance_report.pdf , card should be able to do >90Mpps full duplex using single port.
I use two ports, one port for RX, one for TX, though.
Example commands:
./testpmd --file-prefix=820 --socket-mem=8192,8192 -l 12-23 -n 2 -w 0000:82:00.0,txq_inline=256 – --port-topology=chained --forward-mode=rxonly --rss-udp --rxq=2 --txq=2 --nb-cores=8 --socket-num=1 --stats-period=1 --burst=128 --rxd=2048 --txd=512
./testpmd --file-prefix=820 --socket-mem=8192,8192 -l 12-23 -n 2 -w 0000:82:00.0,txq_inline=256 – --port-topology=chained --forward-mode=rxonly --rss-udp --rxq=8 --txq=8 --nb-cores=8 --socket-num=1 --stats-period=1 --burst=128 --rxd=2048 --txd=512
./pktgen --file-prefix=both --socket-mem=28672,28672 -w 0000:82:00.0,txq_inline=256,txqs_min_inline=4 -w 0000:82:00.1,txq_inline=256,txqs_min_inline=4 -l 0-11,12-23 -n 4 – -P -N -T -m “[1:12-15].0, [16-23:1].1”
…