TX2 (and TX1) network problems when used with two NICs

Hello!

I am having strange problems with the Jetson TX2 and the TX1, when I connect two network cards. When I, for example, use a TX2 on its developer board and connect a usb3.0 network dongle and set up a network bridge, the driver crashes or at least throws exceptions as soon as I generate traffic across the bridge.

Sometimes the system recovers, most of the time it doesn’t and I need to reboot. I would really love to have the possibility to route full GigE traffic through the Jetson every once in a while.

edit: I am using a custom compiled 4.4.38 kernel (nvidia sources)

dmesg output:

[Dec19 13:02] NETDEV WATCHDOG: eth1 (r8152): transmit queue 0 timed out
[  +0.006559] ------------[ cut here ]------------
[  +0.004666] WARNING: at ffffffc0009a376c [verbose debug info unavailable]
[  +0.006853] Modules linked in: bcmdhd pci_tegra bluedroid_pm

[  +0.007305] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.38 #1
[  +0.005977] Hardware name: quill (DT)
[  +0.003701] task: ffffffc0011ecec0 ti: ffffffc0011dc000 task.ti: ffffffc0011dc000
[  +0.007564] PC is at dev_watchdog+0x2ac/0x2bc
[  +0.004401] LR is at dev_watchdog+0x2ac/0x2bc
[  +0.006035] pc : [<ffffffc0009a376c>] lr : [<ffffffc0009a376c>] pstate: 60000045
[  +0.010773] sp : ffffffc0011dfb40
[  +0.004960] x29: ffffffc0011dfb40 x28: ffffffc1e32b63b8
[  +0.007005] x27: ffffffc0011b0ab8 x26: 0000000000000280
[  +0.006941] x25: 00000000ffffffff x24: 0000000000000000
[  +0.007021] x23: ffffffc1e32b63a0 x22: ffffffc00135e000
[  +0.007024] x21: ffffffc1e4e07c00 x20: ffffffc1e32b6000
[  +0.007015] x19: ffffffc0011e2000 x18: ffffffc000b82f38
[  +0.006966] x17: 00000000a9adee6f x16: ffffffc000b12a60
[  +0.006912] x15: ffffffc000b12a60 x14: 7420302065756575
[  +0.006925] x13: 712074696d736e61 x12: 7274203a29323531
[  +0.006953] x11: 3872282031687465 x10: 203a474f44484354
[  +0.006922] x9 : 0000000000000386 x8 : 0000000000000002
[  +0.006894] x7 : 0000000000000000 x6 : 0000000000000049
[  +0.006850] x5 : 0000000000000000 x4 : 0000000000000000
[  +0.006837] x3 : 0000000000000000 x2 : 0000000000000102
[  +0.006848] x1 : ffffffc0011dc000 x0 : 0000000000000039

[  +0.009820] ---[ end trace e2f3a7c9f7da4dad ]---
[  +0.006114] Call trace:
[  +0.003893] [<ffffffc0009a376c>] dev_watchdog+0x2ac/0x2bc
[  +0.006839] [<ffffffc0001055e4>] call_timer_fn+0x50/0x1bc
[  +0.006795] [<ffffffc000105910>] run_timer_softirq+0x1ac/0x2a4
[  +0.007206] [<ffffffc0000a8974>] __do_softirq+0x10c/0x368
[  +0.006747] [<ffffffc0000a8e28>] irq_exit+0x84/0xdc
[  +0.006198] [<ffffffc0000f45e4>] __handle_domain_irq+0x6c/0xb4
[  +0.007075] [<ffffffc0000815dc>] gic_handle_irq+0x5c/0xb4
[  +0.006646] [<ffffffc000084740>] el1_irq+0x80/0xf8
[  +0.006063] [<ffffffc0007b8594>] cpuidle_enter+0x18/0x20
[  +0.006578] [<ffffffc0000e7a74>] call_cpuidle+0x28/0x50
[  +0.006413] [<ffffffc0000e7c18>] cpu_startup_entry+0x17c/0x340
[  +0.007063] [<ffffffc000b020e0>] rest_init+0x84/0x8c
[  +0.006202] [<ffffffc00109797c>] start_kernel+0x39c/0x3b0
[  +0.006616] [<0000000080b08000>] 0x80b08000
[  +0.005555] r8152 2-1:1.0 eth1: Tx timeout
[  +5.722539] r8152 2-1:1.0 eth1: Tx timeout

The bridge is configured as following in network/interfaces:

#
# Bridging dhcp configuration
#

# eth0 and eth1 are bridged together
# Set up interfaces manually, avoiding conflicts with, e.g., network manager
iface eth0 inet manual
iface eth1 inet manual

# Bridge setup
auto br0
iface br0 inet dhcp
        bridge_waitport 10 eth0 eth1
        bridge_ports eth0 eth1

Here some more information on the network configuration:

$ brctl showstp br0
br0
 bridge id		8000.00044b8d4688
 designated root	8000.00044b8d4688
 root port		   0			path cost		   0
 max age		  20.00			bridge max age		  20.00
 hello time		   2.00			bridge hello time	   2.00
 forward delay		  15.00			bridge forward delay	  15.00
 ageing time		 300.00
 hello timer		   0.00			tcn timer		   0.00
 topology change timer	   0.00			gc timer		 146.62
 flags

eth0 (1)
 port id		8001			state		     forwarding
 designated root	8000.00044b8d4688	path cost		   4
 designated bridge	8000.00044b8d4688	message age timer	   0.00
 designated port	8001			forward delay timer	   0.00
 designated cost	   0			hold timer		   0.00
 flags

eth1 (2)
 port id		8002			state		     forwarding
 designated root	8000.00044b8d4688	path cost		   4
 designated bridge	8000.00044b8d4688	message age timer	   0.00
 designated port	8002			forward delay timer	   0.00
 designated cost	   0			hold timer		   0.00
 flags

$ sudo ethtool eth0
Settings for eth0:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Half 1000baseT/Full
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Half 1000baseT/Full
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Link partner advertised link modes:  10baseT/Half 10baseT/Full
	                                     100baseT/Half 100baseT/Full
	                                     1000baseT/Full
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 0
	Transceiver: external
	Auto-negotiation: on
	Supports Wake-on: ug
	Wake-on: g
	Link detected: yes
    
$ sudo ethtool eth1
Settings for eth1:
	Supported ports: [ MII ]
	Supported link modes:   10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full
	Supported pause frame use: No
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Link partner advertised link modes:  10baseT/Half 10baseT/Full
	                                     100baseT/Half 100baseT/Full
	                                     1000baseT/Half 1000baseT/Full
	Link partner advertised pause frame use: No
	Link partner advertised auto-negotiation: Yes
	Speed: 1000Mb/s
	Duplex: Full
	Port: MII
	PHYAD: 32
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: g
	Current message level: 0x00007fff (32767)
			       drv probe link timer ifdown ifup rx_err tx_err tx_queued intr tx_done rx_status pktdata hw wol
	Link detected: yes

$ sudo lshw -C network
  *-network:0 DISABLED
       description: Ethernet interface
       physical id: 3
       logical name: dummy0
       serial: 6a:16:ac:0a:84:cf
       capabilities: ethernet physical
       configuration: broadcast=yes driver=dummy driverversion=1.0
  *-network:1 DISABLED
       description: Ethernet interface
       physical id: 4
       logical name: wlan0
       serial: 00:04:4b:8d:46:86
       capabilities: ethernet physical
       configuration: broadcast=yes driver=wl driverversion=0 multicast=yes
  *-network:2
       description: Ethernet interface
       physical id: 5
       logical name: eth1
       serial: 00:e0:92:00:19:99
       size: 1Gbit/s
       capacity: 1Gbit/s
       capabilities: ethernet physical mii 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=r8152 driverversion=v2.03.3 (2015/01/29) duplex=full link=yes multicast=yes port=MII speed=1Gbit/s
  *-network:3
       description: Ethernet interface
       physical id: 6
       logical name: eth0
       serial: 00:04:4b:8d:46:88
       size: 1Gbit/s
       capacity: 1Gbit/s
       capabilities: ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt 1000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=eqos duplex=full link=yes multicast=yes port=MII speed=1Gbit/s

I can’t remember the details, but years ago when working on a bridge with the 8152 driver I had something similar occur (I needed to build a lossy/lagged network simulator). The issue turned out to be something was invalid in the bridge setup commands, e.g., an infinite loop of one interface sending to the other interface…and the other interface sending back to the original interface…it makes it difficult for a packet to reach its final destination (infinite routes take a bit of extra time to traverse!).

So I cannot guarantee it, but I suspect you’ll find your commands for bringing up the bridge has some seemingly trivial change which will fix it.