Eqos hangs with heavy inbound traffic

We have a Xavier with a custom carrier board, but very similar to the dev kit. We’re using Jetpack 4.2.2.

When there is heavy inbound scp traffic into the Xavier on the wired ethernet port (using the eqos driver) the port will often hang. The rx packet count stops incrementing in this state. A reboot fixes it, obviously, but so does ifconfig down/up and a network manager restart. This is very easy to reproduce - copy a large file to the Xavier over the wired link and it will fail within 30 seconds.

The problem only happens with heavy inbound traffic. Outbound traffic isn’t a problem.

The wireless card still works, and I can get into the Xavier that way even when the eqos port is hung, so the network stack isn’t completely borked.

There are no obvious kernel errors in the logs. I also tried putting a Jetpack 4.4 eqos driver (4.4 eqos driver source built in the 4.2.2 tree) on the Xavier and it behaved exactly the same.

Curiously, this problem doesn’t happen when pushing data to the Xavier using other networking tools like netcat (TCP) or iperf. It’s only a problem for scp inbound traffic. Wireshark shows what you would expect - the Xavier behaves like it’s not receiving any more packets on that port.

I’ve also tried disabling things like scatter-gather, TSO, etc. on the Xavier with no improvement.

Any bright ideas would be appreciated.

Thanks
Jim

Hi Jim,

Could you do the full upgrade to jetpack4.4 instead of only updating the eqos driver?

If you can still reproduce issue with jp4.4, please share your steps for us to reproduce issue. Thanks.

We now think this is an issue with our PHY. We’re still digging into the root cause.

Thanks for responding.
Jim