AGX Thor QSFP MGBE Link Instability Between Two DevKits

Hi,

We have two AGX Thor DevKits, both running JetPack 7.1, with all MGBE interfaces configured for 10 Gbps.

When the two AGX Thor DevKits are connected to each other over the QSFP interface, the MGBE links are unstable. We also see periodic kernel log messages, shown below.

The QSFP transceiver in use is listed below.

For reference, the MGBE ports work correctly when each AGX Thor DevKit is connected to a 10G NIC (Intel 82599) using a QSFP-to-quad-LC breakout cable.

Has anyone seen this behavior before or have any suggestions on what might be causing it?

Thank you

nmcli |grep mtu 

to see if your 4 mgbe are all at same MTU

sudo dmesg -T | grep -i rcu

to see if you are have an error others have had here with thor to thor connection but at 25 x 4 gb connection.

One thing that might help is to enable pause frames in device tree as described here.

The MTU on both systems is identical:

image

There are no rcu error messages:

image

This looks like a link stability problem as we are seeing periodic carrier lost messages in dmesg

journalctl -u systemd-networkd --no-pager
systemctl status systemd-networkd --no-pager

NetworkManager might be a better manager.

Edit values to your settings.
sudo nano /etc/netplan/01-mgbe.yaml

network:
  version: 2
  renderer: networkd

  ethernets:
    mgbe0_0:
      dhcp4: no
      dhcp6: no
      optional: true
      mtu: 1500
      addresses:
        - 192.168.10.10/24

    mgbe1_0:
      dhcp4: no
      dhcp6: no
      optional: true
      mtu: 1500
      addresses:
        - 192.168.11.10/24

    mgbe2_0:
      dhcp4: no
      dhcp6: no
      optional: true
      mtu: 1500
      addresses:
        - 192.168.12.10/24

    mgbe3_0:
      dhcp4: no
      dhcp6: no
      optional: true
      mtu: 1500
      addresses:
        - 192.168.13.10/24

Then apply:

sudo netplan generate
sudo netplan apply

OR if you want bonding you will need to compile bonding.ko

Bonded 4x10GbE LACP example
sudo nano /etc/netplan/01-mgbe.yaml

network:
  version: 2
  renderer: networkd

  ethernets:
    mgbe0_0:
      dhcp4: no
      dhcp6: no
      optional: true
      mtu: 1500

    mgbe1_0:
      dhcp4: no
      dhcp6: no
      optional: true
      mtu: 1500

    mgbe2_0:
      dhcp4: no
      dhcp6: no
      optional: true
      mtu: 1500

    mgbe3_0:
      dhcp4: no
      dhcp6: no
      optional: true
      mtu: 1500

  bonds:
    bond0:
      interfaces:
        - mgbe0_0
        - mgbe1_0
        - mgbe2_0
        - mgbe3_0
      dhcp4: no
      dhcp6: no
      mtu: 1500
      addresses:
        - 192.168.100.10/24
      routes:
        - to: default
          via: 192.168.100.1
      nameservers:
        addresses: [1.1.1.1, 8.8.8.8]
      parameters:
        mode: 802.3ad
        lacp-rate: fast
        mii-monitor-interval: 100
        transmit-hash-policy: layer3+4