Debug rx_discards_phy on ConnectX-6

I’m testing a Mellanox 100Gbe ConnectX-6 with xdp-bench drop. After reaching about 110Mpps (64 bytes), rx_discards_phy counter starts increasing. Even with bigger packets (256 bytes), I can’t reach line rate. CPU usage is <5%. Is there a way to track what is causing the discards?

  • NIC model: CX653105A (fw 20.42.1000)
  • LnkSta: Speed 8GT/s (ok), Width x16 (ok)
  • OFED driver version: 24.07
  • Kernel: 6.10.8
  • CPU: dual Intel Platinum 8273CL (28 cores) (queues are pinned to the local CPU)
  • mlxn_tune is executed at startup
  • CPU BIOS performance set to Maximum Performance
  • PCIe max read request size increased from 512 to 4096
  • CQE_COMPRESSION=1
  • HyperThreading is disabled

I noticed that if I set hfunc to xor instead of toeplitz, I can reach line rate with 256 bytes packets, but not with 64 bytes (but this is because some cores goes to 100% probably due to traffic polarization). In fact, it seems like toeplitz hashing reduces the NIC performance.
I also noticed that by restricting assigned cores to 16 I can reach about 120Mpps:
ethtool -L enp59s0np0 combined 16
set_irq_affinity_cpulist.sh 0-15 enp59s0np0