Hi there,
I have a network consisting of Ryzen servers running ConnectX 4 Lx (MT27710 family) which run a fairly intense workload involving a lot of small packet websockets traffic. We’re noticing the rx_prio0_discards counter is continuing the climb even after we’ve replaced the NIC and increased the ring buffer to 8192
Ring parameters for enp65s0f1np1:
Pre-set maximums:
RX: 8192
RX Mini: 0
RX Jumbo: 0
TX: 8192
Current hardware settings:
RX: 8192
RX Mini: 0
RX Jumbo: 0
TX: 8192
Any ideas or suggestions here?
Hello Alexander,
Thank you for posting your question on the Mellanox Community.
With discards increasing there is a chance that the adapter or node is not processing packets quickly enough. We would advise aligning your system in accordance with our tuning guide which can be found here:
https://community.mellanox.com/s/article/performance-tuning-for-mellanox-adapters
The following tuning guide may also help as you are using AMD architecture. Please note that it may not be 100% applicable as it is more directed at the AMD EPYC datacenter processors rather than the Ryzen line.
https://support.mellanox.com/s/article/how-to-tune-an-amd-server--eypc-cpu--for-maximum-performance
You may also be able to improve performance by enabling iommu passthrough. We have seen improvement in the past when using this option with AMD EYPC processors.
https://support.mellanox.com/s/article/understanding-the-iommu-linux-grub-file-configuration
The following page may also help to better understand the counters:
https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters
If you need any further assistance, please open a case through the Mellanox support portal using a account with a valid support contract
https://support.mellanox.com/s/
Thanks and regards,
~Mellanox Technical Support