AGX Xavier Devkit causes pause frames storm

We have a Jetson AGX Xavier w/Devkit connected to a 5-port ethernet switching hub with other computers and upstream to the Internet.
We often see an issue that all the computers connected to the switching hub cannot be accessed from anywhere even the computers are still alive.
While digging into the issue, I found that the Xavier broadcasts pause frames constantly which make all network devices connected to the same layer of the network stop sending any packets.
I could not find any suspicious log messages in journald or dmesg of the Xavier.
How to fix the issue?

Thank you so much in advance.

Is this able to reproduce with any arbitrary switches/hub?

I mean if I want to reproduce this issue, what should I connect and what should I do?

Also, what jetpack release is in use?

Here is my network configuration:

5-port Gigabit Switching hub (TP-Link TL-SG105) connected to:

  • Router (192.168.5.1/16)
  • Jetson Xavier Devkit (which broadcasts pause frames)
  • Jetson Xavier + ConnectTech Rogue (it does not broadcast pause frame)
  • Jetson TX2i + ConnectTech Quasar (it does not broadcast pause frame)
  • Raspberry Pi 4B

It may took few hours to few days to see the pause frames storm and I’m still not sure when it happens actually.

And the configuration of Xavier is:

  • /etc/nv_tegra_release
R32 (release), REVISION: 6.1, GCID: 27863751, BOARD: t186ref, EABI: aarch64, DATE: Mon Jul 26 19:36:31 UTC 2021
  • uname -srvmpi
Linux 4.9.253-tegra #1 SMP PREEMPT Mon Jul 26 12:19:28 PDT 2021 aarch64 aarch64 aarch64
  • nmcli connection show ‘Wired Connection 1’
ipv4.method:                            manual
ipv4.dns:                               192.168.5.1
ipv4.dns-search:                        foo.bar.net
ipv4.dns-options:                       --
ipv4.dns-priority:                      0
ipv4.addresses:                         192.168.5.15/24
ipv4.gateway:                           192.168.5.1
ipv4.routes:                            --
ipv4.route-metric:                      -1
ipv4.route-table:                       0 (unspec)
ipv4.routing-rules:                     --
ipv4.ignore-auto-routes:                no
ipv4.ignore-auto-dns:                   no
ipv4.dhcp-client-id:                    --
ipv4.dhcp-iaid:                         --
ipv4.dhcp-timeout:                      0 (default)
ipv4.dhcp-send-hostname:                yes
ipv4.dhcp-hostname:                     --
ipv4.dhcp-fqdn:                         --
ipv4.dhcp-hostname-flags:               0x0 (none)
ipv4.never-default:                     no
ipv4.may-fail:                          yes
ipv4.dad-timeout:                       -1 (default)
  • ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: on
	tx-checksum-ip-generic: off [fixed]
	tx-checksum-ipv6: on
	tx-checksum-fcoe-crc: off [fixed]
	tx-checksum-sctp: off [fixed]
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off
	tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-tc-offload: off [fixed]

If I don’t use TP-link TL-SG105 but other hub, would I reproduce this issue?

Also, I may not collect 4 devices on the hub either. Will that affect the test?

@WayneWWW

If I don’t use TP-link TL-SG105 but other hub, would I reproduce this issue?

I’m not sure. It will not matter which switching hub is used, IMO, since the pause frame is sent from the JAX not the hub.

Also, I may not collect 4 devices on the hub either. Will that affect the test?

I did not check whether it happens without the 4 devices (which constantly send/receive data to/from the JAX so it affect at least network traffic if they are disconnected.)
What I confirmed is once the JAX started to send pause frames, it continues to send them constantly even if the other 4 devices are connected/disconnected.

How frequently does this happen? Is there any other method to notice this without using wireshark?

I just wonder if I need to wait for like 10 hours to just get one pause frame.