rx_discards_phy

Have problem with new Mellanox ConnectX-4 Lx EN 50Gbps.

Use it on old server Dual X5650/128Gb DDR3 1333/PCI-E 2.0 x8.

Default settings on RHEL 8.3 + this:

ethtool --set-priv-flags eth2 rx_cqe_compress on

ethtool -C eth2 adaptive-rx off

ethtool -G eth2 rx 8192 tx 8192

setpci -s 06:00.0 68.w=5936

ethtool -A eth2 autoneg off rx off tx off

ifconfig eth2 txqueuelen 20000

ethtool -L eth2 combined 12

service irqbalance stop

<irq smp_affinity to 12 cores with NUMA node #0, as card)>

I test card with XDP program XDP_DROP, and see errors in ethtool -S and packet lose:

rx_xdp_drop: 3801290644

rx_discards_phy: 1296930300

rx_buffer_passed_thres_phy: 7049089607

rx_pci_signal_integrity: 0

tx_pci_signal_integrity: 12

outbound_pci_stalled_rd: 0

outbound_pci_stalled_wr: 0

outbound_pci_stalled_rd_events: 0

outbound_pci_stalled_wr_events: 1076

rx_discards_phy grows along with rx_xdp_drop, amounting to about 27%. outbound_pci_stalled_wr is in the range 50-70. outbound_pci_stalled_wr_events is growing.

Test traffic 6Mpps / 3Gbps, of which ~ 1.7Mpps are dropped. What am I doing wrong? Thanks.

Hi ,

Please refer to the below community

https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters

rx_discards_phy

The number of received packets dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network.

Regarding performance tuning , please refer to the below community :

https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters

In case you need further assistance and debug , please reach our support at :

networking-support@nvidia.com

Thanks,

Samer

Of course, I fully followed all the tuning recommendations and error help. Can you give a specific answer to the questions? The card is 25Gbit, the PCI bus speed in this case is 32GB / s, the processor is loaded no more than 25%, all interrupts are distributed and attached to their cores. But the card cannot “accept” even 3Gbit, and the error counter starts to grow from 600-700Mbps. What are the specific recommendations for solving the problem?