ConnectX-7 poor UDP performance vs ConnectX-6/ConnectX-5

Hello,

I am finding that UDP is subject to a lot of dropped packets on a ConnectX-7 card, but not on ConnectX-5 or ConnectX-6. Specifically, I am using a setup with a PC (that is using a CX-6 PCIe adapter) and a Jetson (that is using a CX-5, CX-6, and CX-7) and I am running iperf3 to get the data.

When I raise the MTU to jumbopackets (and use iperf3 -u -b 10G <...>), what winds up happening is that the CX-7 will start receiving UDP packets with a huge amount of drops, before eventually choking out the entire connection to 100% loss. The CX-7 won’t receive any more packets until I wait for about 30 seconds and start again. However as I mentioned, this doesn’t seem to occur at all on the CX-6 or CX-5 and I’ve seen this same behavior on three different CX-7s, leading me to think that this isn’t some specific hardware defect.

Weirdly, I am also observing inconsistencies with the missed and dropped packet counters as well when this happens, with the packets either being observed as missed, dropped, or just not recorded either way.

Also, this only seems to occur with regular Layer 3 IP traffic over Ethernet and InfiniBand Verbs don’t seem to be affected at all. I can get consistent 80-90 GbE over RoCEv2 with no problems using ib_read_bw.

Now, when I turn off adaptive-rx (using ethtool -C ...) this problem seems to go away for the most part, but I still do get loss of 2-5% at 10G. Is there something wrong with the firmware, software, or both? Happy to provide more details as needed.