ConnectX-6 Dx NIC Performance Issue - rx_prio0_buf_discard Metric Increase

lu.qiuwen · November 15, 2023, 8:34am

Dear support,

We are encountering an issue with our ConnectX-6 Dx network card, where we have observed periodic increases in the ‘rx_prio0_buf_discard’ metric read from the “ethtool -S” and “mlnx_perf -i”.

rx.vport_unicast_packets: 2863 387.20
rx_vport_unicast.bytes: 824,148.379.49 Bps = 6,593.18 Mbps
tx.vport_unicastpackets: 6.041.914.23
tx_vport_unicast_bytes: 6,384,766,196.33 Bps = 5,1078.12 Mbps
rx_vport multicast.packets:0.6
rx_vport_multicast_bytes: 18.83 Bps
tx.vport_multicast_packets: 0.3
tx_vport_multicast_bytes: 4.13 Bps
rx_vport_rdma_unicast packets: 0.3
rx.vport_rdma_unicast bytes: 4.65 Bps
tx_packets_phy: 6.041,912.56
rx_packets_phy: 2.863,706.27
tx bytes_phy: 6,408,938,945.26 Bps = 51271.44 bps
rx bytes_phy: 824.243574.75 Bps = 6 593 94 Mbps
rx_oversize_pkts_phy: .24= 3.31 Mbps
rx_64 bytes_phy: 413,800.61 .Bps1.653.120.23
rx_65_to_127_bytes_phy:254 090.74
rx_128_to_255_bytes_phy:70 029.13
rx_256_to_511bytes_phy:91 112
rx_512_to_1623_bytes-phy
rx_1624_to_1518_bytes-phy: 381,465,45
rx.1519.to_2047_bytes-phy: 87.74
rx_2048_to_4095_bytes-phy: 1.17
rx_prio0_bytes: 824,245,239.68 Bps
rx_prio0_packets: 2,863,707.90
tx_prio0_bytes: 6408 913 331.13 Bps = 51,271.30 Mbps
tx_prio0_packets: 6,041.897.65
rx_prio0_buf_discard: 1,155.6
Up 0: 51,271.36. Mbps= 100.00%
up0: 6,841,897.65 Tran/sec = 100.

We would appreciate your assistance in addressing the following concerns:

Does the increase in this metric indicate packet loss?
What could be the potential causes of this issue, and how can it be resolved? Would it require tunning or a firmware upgrade?

Here are the details of our environment for your reference:

Network Card Model: ConnectX-6 Dx
Firmware Version: 22.34.4000
Operating System: Rocky Linux 8.8
Application: Based on DPDK 21.11.4 with the mlx5 delay drop feature enabled. EAL Parameters: rxq_cqe_comp_en=4, allow_duplicate_pattern=0, delay_drop=0x1, rxq_pkt_pad_en=0x1, txq_inline_max=128, txq_inline_mpw=128

We have consulted the article “UNDERSTANDING MLX5 ETHTOOL COUNTERS” (https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters) to gain insights into the ‘rx_prio0_buf_discard’ metric. However, the explanations were unclear and lacked actionable guidance for resolving our issue.

Additionally, we attempted to address the problem by utilizing the tuning tool ‘mlxn_tune’ with the ‘HIGH_THROUGHPUT’ parameter. Unfortunately, this approach did not yield the desired results, as the issue with the rx_prio0_buf_discard metric persists.

We would greatly appreciate any further assistance or recommendations you could provide to help us resolve this matter effectively.

Levei_Luo · November 16, 2023, 7:44am

Hi,
The direct cause of rx_prio[i]_buf_discard is the related buffer in NIC is full，when it can’t move the packets to the host quickly enough.

The cause can be very broad. It might be caused by the back pressure from the CPU/Memory/PCIe/App performance.

For DPDK, a good start is with DPDK performance report.

Tune with the “Test Settings” first.

Some other tips:

Check PCIe bandwidth . lspci -vvv -s bdf. check if the ‘LnkSta’ is the same as ‘LnkCap’
The CPU setting is very important. Consult CPU vendor, make sure it works at high performance mode.
Don’t use low power mode, don’t use dynamic frequency.
Check with the server vendor about if there are some optimizations for the server (mem/pci).
Check the DPDK flow rules. If there are many rules, try to reduce the rules to test. Isolate whether the discards is caused by the rules or not.

Good Luck.

lu.qiuwen · November 16, 2023, 8:19am

@Levei_Luo Thanks for your reply!

Is this problem related to the mlx5 delay drop feature? This feature is enabled when the phenomenon appears.

Thanks!

Levei_Luo · November 17, 2023, 6:43am

Do you mean PFC?
Yes, that’s possible too.
When set PFC, the buffer will be spited into small buffers.
This would make discards more likely to occur.

use mlnx_qos -i to check the buffer size

mlnx_qos -i eth0

DCBX mode: OS controlled
Priority trust state: pcp
default priority:
Receive buffer size (bytes): 0,156096,0,0,0,0,0,0,
…

Don’t manually change the buffer size directly on CX-6 Dx.

lu.qiuwen · November 17, 2023, 7:26am

The delay drop feature is described in the DPDK document: https://doc.dpdk.org/guides/nics/mlx5.html

delay_drop parameter [int]

Bitmask value for the Rx queue delay drop attribute. Bit 0 is used for the standard Rx queue and bit 1 is used for the hairpin Rx queue.

By default, the delay drop is disabled for all Rx queues. It will be ignored if the port does not support the attribute even if it is enabled explicitly. The packets being received will not be dropped immediately when the WQEs are exhausted in a Rx queue with delay drop enabled.

A timeout value is set in the driver to control the waiting time before dropping a packet. Once the timer is expired, the delay drop will be deactivated for all the Rx queues with this feature enable. To re-activate it, a rearming is needed and it is part of the kernel driver starting from MLNX_OFED 5.5.

To enable / disable the delay drop rearming, the private flag dropless_rq can be set and queried via ethtool:

ethtool –set-priv-flags dropless_rq on (/ off)

ethtool –show-priv-flags

The configuration flag is global per PF and can only be set on the PF, once it is on, all the VFs’, SFs’ and representors’ Rx queues will share the timer and rearming.

lu.qiuwen · November 17, 2023, 7:32am

Here is the output of the mlnx_qos command for ens4f0np0. The output is the same whether the delay drop feature is enabled or not.

# mlnx_qos -i ens4f0np0
DCBX mode: OS controlled
Priority trust state: pcp
default priority:
Receive buffer size (bytes): 20352,0,0,0,0,0,0,0,
Cable len: 7
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   0   0   0   0   
        buffer      0   0   0   0   0   0   0   0   
tc: 1 ratelimit: unlimited, tsa: vendor
         priority:  0
tc: 0 ratelimit: unlimited, tsa: vendor
         priority:  1
tc: 2 ratelimit: unlimited, tsa: vendor
         priority:  2
tc: 3 ratelimit: unlimited, tsa: vendor
         priority:  3
tc: 4 ratelimit: unlimited, tsa: vendor
         priority:  4
tc: 5 ratelimit: unlimited, tsa: vendor
         priority:  5
tc: 6 ratelimit: unlimited, tsa: vendor
         priority:  6
tc: 7 ratelimit: unlimited, tsa: vendor
         priority:  7

limingfeng · January 22, 2024, 9:53am

Hello,

I have encountered the same issue with packet drops when using dpdk-testpmd. I noticed that the discard packet statistics are related to rx_phy_discard_packets through the xstats command. I have tried several optimization solutions, but the problem persists

lu.qiuwen · January 22, 2024, 10:04am

Hi, @limingfeng

Unfortunately, I did not find the final solution to fix this problem.

limingfeng · January 22, 2024, 10:13am

May I ask what is your CPU model? I think it might be related to AMD chips, is that possible?

lu.qiuwen · January 22, 2024, 10:16am

@limingfeng

My CPU model is AMD EPYC 7702 64-Core Processor, Dual Socket.

May I ask why do you think the problem is related to the AMD chips?

limingfeng · January 22, 2024, 10:19am

“I have contacted Dell’s FAE, and they mentioned that optimizing for AMD chips is quite complex. Similar experiences were shared in DPDK communication groups, suggesting a correlation with AMD chips.”

mpbhargav · December 19, 2024, 1:35pm

Running into the same issue(rx_prio0_bf_discard) with AMD 7H12 with testpmd for any FW versions > 22.28.1002

Topic		Replies	Views
Rx_prio0_buf_discard Counter Keeps Increasing Despite Low Traffic Mellanox OFED mellanox-ofed	1	24	August 14, 2025
ConnectX increase RX buffer miss counter with max buffer size Adapters and Cables	1	1811	September 13, 2021
Debug rx_discards_phy on ConnectX-6 Mellanox OFED	0	205	October 6, 2024
What is causing rx_discards_phy to occasionally increase? Software And Drivers ethernet , infiniband , qp , rx	4	2700	November 11, 2019
rx_discards_phy Adapters and Cables problem , ethtool , rx	2	2775	February 14, 2021
How to generate PFC pause packets with Mlnx CX-5 and CX-6 Dx cards? Ethernet Adapter Cards	11	1630	June 8, 2023
ConnectX6DX - rte-flow / RSS performance drop on mixed traffic Ethernet Adapter Cards dpdk , flow-steering , rss-debugging	0	1443	May 31, 2022
I'm experiencing a performance issue on ConnectX5-Ex cards (device ID 0x1019) in the form of a limit of the packet rate to around 6Mpps with production Internet traffic. Software And Drivers performance	2	890	January 31, 2022
ConnectX6 DPDK dpdk-testpmd Receive tcp ,udp Mixed flow performance is very low! Software And Drivers	2	1005	January 31, 2022
ConnectX6 (mlx5 kernel driver) strange behavior? Ethernet Adapter Cards kernel , ubuntu	2	3395	September 14, 2022

ConnectX-6 Dx NIC Performance Issue - rx_prio0_buf_discard Metric Increase

mlnx_qos -i eth0

Related topics