ConnectX-6 Dx NIC Performance Issue - rx_prio0_buf_discard Metric Increase

Dear support,

We are encountering an issue with our ConnectX-6 Dx network card, where we have observed periodic increases in the ‘rx_prio0_buf_discard’ metric read from the “ethtool -S” and “mlnx_perf -i”.

rx.vport_unicast_packets: 2863 387.20
rx_vport_unicast.bytes: 824,148.379.49 Bps = 6,593.18 Mbps
tx.vport_unicastpackets: 6.041.914.23
tx_vport_unicast_bytes: 6,384,766,196.33 Bps = 5,1078.12 Mbps
rx_vport multicast.packets:0.6
rx_vport_multicast_bytes: 18.83 Bps
tx.vport_multicast_packets: 0.3
tx_vport_multicast_bytes: 4.13 Bps
rx_vport_rdma_unicast packets: 0.3
rx.vport_rdma_unicast bytes: 4.65 Bps
tx_packets_phy: 6.041,912.56
rx_packets_phy: 2.863,706.27
tx bytes_phy: 6,408,938,945.26 Bps = 51271.44 bps
rx bytes_phy: 824.243574.75 Bps = 6 593 94 Mbps
rx_oversize_pkts_phy: .24= 3.31 Mbps
rx_64 bytes_phy: 413,800.61 .Bps1.653.120.23
rx_65_to_127_bytes_phy:254 090.74
rx_128_to_255_bytes_phy:70 029.13
rx_256_to_511bytes_phy:91 112
rx_512_to_1623_bytes-phy
rx_1624_to_1518_bytes-phy: 381,465,45
rx.1519.to_2047_bytes-phy: 87.74
rx_2048_to_4095_bytes-phy: 1.17
rx_prio0_bytes: 824,245,239.68 Bps
rx_prio0_packets: 2,863,707.90
tx_prio0_bytes: 6408 913 331.13 Bps = 51,271.30 Mbps
tx_prio0_packets: 6,041.897.65
rx_prio0_buf_discard: 1,155.6
Up 0: 51,271.36. Mbps= 100.00%
up0: 6,841,897.65 Tran/sec = 100.

We would appreciate your assistance in addressing the following concerns:

  1. Does the increase in this metric indicate packet loss?
  2. What could be the potential causes of this issue, and how can it be resolved? Would it require tunning or a firmware upgrade?

Here are the details of our environment for your reference:

  1. Network Card Model: ConnectX-6 Dx
  2. Firmware Version: 22.34.4000
  3. Operating System: Rocky Linux 8.8
  4. Application: Based on DPDK 21.11.4 with the mlx5 delay drop feature enabled. EAL Parameters: rxq_cqe_comp_en=4, allow_duplicate_pattern=0, delay_drop=0x1, rxq_pkt_pad_en=0x1, txq_inline_max=128, txq_inline_mpw=128

We have consulted the article “UNDERSTANDING MLX5 ETHTOOL COUNTERS” (https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters) to gain insights into the ‘rx_prio0_buf_discard’ metric. However, the explanations were unclear and lacked actionable guidance for resolving our issue.

Additionally, we attempted to address the problem by utilizing the tuning tool ‘mlxn_tune’ with the ‘HIGH_THROUGHPUT’ parameter. Unfortunately, this approach did not yield the desired results, as the issue with the rx_prio0_buf_discard metric persists.

We would greatly appreciate any further assistance or recommendations you could provide to help us resolve this matter effectively.

Hi,
The direct cause of rx_prio[i]_buf_discard is the related buffer in NIC is full,when it can’t move the packets to the host quickly enough.

The cause can be very broad. It might be caused by the back pressure from the CPU/Memory/PCIe/App performance.

For DPDK, a good start is with DPDK performance report.

Tune with the “Test Settings” first.

Some other tips:

  1. Check PCIe bandwidth . lspci -vvv -s bdf. check if the ‘LnkSta’ is the same as ‘LnkCap’
  2. The CPU setting is very important. Consult CPU vendor, make sure it works at high performance mode.
    Don’t use low power mode, don’t use dynamic frequency.
  3. Check with the server vendor about if there are some optimizations for the server (mem/pci).
  4. Check the DPDK flow rules. If there are many rules, try to reduce the rules to test. Isolate whether the discards is caused by the rules or not.

Good Luck.

@Levei_Luo Thanks for your reply!

Is this problem related to the mlx5 delay drop feature? This feature is enabled when the phenomenon appears.

Thanks!

Do you mean PFC?
Yes, that’s possible too.
When set PFC, the buffer will be spited into small buffers.
This would make discards more likely to occur.

use mlnx_qos -i to check the buffer size

mlnx_qos -i eth0

DCBX mode: OS controlled
Priority trust state: pcp
default priority:
Receive buffer size (bytes): 0,156096,0,0,0,0,0,0,

Don’t manually change the buffer size directly on CX-6 Dx.

The delay drop feature is described in the DPDK document: https://doc.dpdk.org/guides/nics/mlx5.html

  • delay_drop parameter [int]

Bitmask value for the Rx queue delay drop attribute. Bit 0 is used for the standard Rx queue and bit 1 is used for the hairpin Rx queue.

By default, the delay drop is disabled for all Rx queues. It will be ignored if the port does not support the attribute even if it is enabled explicitly. The packets being received will not be dropped immediately when the WQEs are exhausted in a Rx queue with delay drop enabled.

A timeout value is set in the driver to control the waiting time before dropping a packet. Once the timer is expired, the delay drop will be deactivated for all the Rx queues with this feature enable. To re-activate it, a rearming is needed and it is part of the kernel driver starting from MLNX_OFED 5.5.

To enable / disable the delay drop rearming, the private flag dropless_rq can be set and queried via ethtool:

  • ethtool –set-priv-flags dropless_rq on (/ off)
  • ethtool –show-priv-flags

The configuration flag is global per PF and can only be set on the PF, once it is on, all the VFs’, SFs’ and representors’ Rx queues will share the timer and rearming.

Here is the output of the mlnx_qos command for ens4f0np0. The output is the same whether the delay drop feature is enabled or not.

# mlnx_qos -i ens4f0np0
DCBX mode: OS controlled
Priority trust state: pcp
default priority:
Receive buffer size (bytes): 20352,0,0,0,0,0,0,0,
Cable len: 7
PFC configuration:
        priority    0   1   2   3   4   5   6   7
        enabled     0   0   0   0   0   0   0   0   
        buffer      0   0   0   0   0   0   0   0   
tc: 1 ratelimit: unlimited, tsa: vendor
         priority:  0
tc: 0 ratelimit: unlimited, tsa: vendor
         priority:  1
tc: 2 ratelimit: unlimited, tsa: vendor
         priority:  2
tc: 3 ratelimit: unlimited, tsa: vendor
         priority:  3
tc: 4 ratelimit: unlimited, tsa: vendor
         priority:  4
tc: 5 ratelimit: unlimited, tsa: vendor
         priority:  5
tc: 6 ratelimit: unlimited, tsa: vendor
         priority:  6
tc: 7 ratelimit: unlimited, tsa: vendor
         priority:  7