Tegra B freezes when rebooting Tegra A

I am trying to move some of our workload from the Tegra A SoC to the Tegra B SoC. I am running into an issue where if the Tegra A is rebooted (via sudo reboot), the Tegra B can become unresponsive.

To reproduce this issue:

  1. Flash Tegra A and Tegra B with drive 4.1.8.0L software using DriveInstall_4.1.8.0L_SDK_b43.run
  2. Assign a static IP to both tegras. by editing /etc/network/interfaces - we use 192.168.3.50 for A and 192.168.3.49 for B.
  3. Run ping 192.168.3.49 to see when B is active
  4. Issue the command aurixreset to the AURIX Console. After running this you should see Tegra B stop responding to pings, then come back online when it finishes rebooting.
  5. ssh to Tegra A and run sudo reboot. At this point Tegra B should stop responding to pings and will not recover until power is lots or until aurixresetis run.

I am looking for either (1) a way to recover Tegra B after it enters the non-responsive state, or (2) a way to reset the system that does not cause Tegra B to freeze. We want to avoid using AURIX Console over USB for this purpose because it would require retrofitting our systems. If there is another way to access the console that would work well.

We currently use PX2 Software Version 4.1.8.0L, and it would be difficult for us to switch away from this version. If this is the only way to resolve our issue we will probably instead add the USB cable to each of our systems.

It seems like another user ran into a similar issue in 2018 with a newer version of the software. Tegra B sometimes hangs/locks when rebooting Tegra A

Hi @jchadwell,

are you able to open the debug console for tegra B when you execute the reboot command on tegra A?
(please refer to the instructions for connecting to tegra B through ttyUSB6 )

if you could post the output of tegra B debug console and check if it is responsive and the output of command ifconfig after rebooting tegra A it might help up understand better the problem.

Are you able to share why it would be difficult for you to upgrade to the latest version released for Drive PX2?

The Tegra B usb console (on ttyUSB6) stops responding at the same moment that Tegra B stops responding to pings. Around 30 seconds after it freezes, the message “[00203361] wdt: expired vmid 0” appeared in the log. Full output at time of crash follows:

[  OK  ] Started Getty on tty1.
[    **] (1 of 2) A start job is running for...lient on reboot (34s / 5min 14s[   ***] (1 of 2) A start job is running for...lient on reboot (34s / 5min 14s[  *** ] (1 of 2) A start job is running for...lient on reboot (35s / 5min 14s[*     ] (1 of 2) A start job is running for...lient on reboot (37s / 5min 14s[**    ] (1 of 2) A start job is running for...lient on reboot (37s / 5min 14s[***   ] (1 of 2) A start job is running for...lient on reboot (38s / 5min 14s[    **] (1 of 2) A start job is running for...lient on reboot (40s / 5min 14s[     *] (1 of 2) A start job is running for...lient on reboot (40s / 5min 14s[    **] (1 of 2) A start job is running for...lient on reboot (41s / 5min 14s[***   ] (1 of 2) A start job is running for...lient on reboot (43s / 5min 14s[**    ] (1 of 2) A start job is running for...lient on reboot (43s / 5min 14s[*     ] (1 of 2) A start job is running for...lient on reboot (44s / 5min 14s[  OK  ] Started LSB: start ptpd client on reboot.
[  OK  ] Created slice User Slice of nvidia.
         Starting User Manager for UID 1001...
[  OK  ] Started Session 1 of user nvidia.
[  OK  ] Started User Manager for UID 1001.
         Stopping User Manager for UID 1001...
[  OK  ] Stopped User Manager for UID 1001.
[  OK  ] Removed slice User Slice of nvidia.
[    **] A start job is running for Setup Wizard (2min 18s / no limit)�[00203361] wdt: expired vmid 0
[00263361] wdt: expired vmid 0
[00323363] w

Here is the ifconfig output from Tegra A after it reboots (at which point tegra B is frozen).

nvidia@nvidia:~$ ifconfig
enp3s0    Link encap:Ethernet  HWaddr 00:04:4b:8d:99:68  
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 00:04:4b:8d:a2:a0  
          inet addr:192.168.3.50  Bcast:192.168.3.255  Mask:255.255.255.0
          inet6 addr: fe80::204:4bff:fe8d:a2a0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:108 errors:0 dropped:0 overruns:0 frame:0
          TX packets:96 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:11429 (11.4 KB)  TX bytes:1061283 (1.0 MB)
          Interrupt:46 

eth0.200  Link encap:Ethernet  HWaddr 00:04:4b:8d:a2:a0  
          inet addr:10.42.0.28  Bcast:10.42.0.255  Mask:255.255.255.0
          inet6 addr: fe80::204:4bff:fe8d:a2a0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:16 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:1284 (1.2 KB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:60 errors:0 dropped:0 overruns:0 frame:0
          TX packets:60 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1 
          RX bytes:4800 (4.8 KB)  TX bytes:4800 (4.8 KB)

Is there any other debug information I can provide to help you understand the problem?

Hi jchadwell,

Can you also post the details on what modifications you made to /etc/network/interfaces so I can try and reproduce?

Hi Luke,

Here are the interfaces files that we use.

Tegra A:

# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

# Tegra A network configuration
auto eth0
iface eth0 inet static
 address 192.168.3.50
 netmask 255.255.255.0

Tegra B:

# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

# Tegra B network configuration
auto eth0
iface eth0 inet static
 address 192.168.3.49
 netmask 255.255.255.0

Dear @jchadwell,
I tested the same steps on latest release and I don’t see any issue. I could connect to both Tegras via minicom and notice eth0 IP were set as expected.

Hi jchadwell,

Do you see the same issue with reboot if you don’t configure the static IPs? I.e. is the issue associated with the static ip assignment?

Hi @LukeNV

I don’t believe that the static IP is part of the issue. I only included it in my post to help your team replicate exactly what I am seeing. However, I don’t have a great way to test it without configuring an external IP for the board. We typically interact with our boards over the external IP interface (no monitor / keyboard / mouse plugged in).

I will check if the issue replicates without setting the IP configuration on just Tegra B, which I should be able to accomplish by ssh’ing through Tegra A’s external IP then modifying Tegra B over the internal LAN. I’ll post the results of this test here in an hour or so.

I was able to confirm that modifying the networking configuration on Tegra B is not necessary to cause this issue to replicate. It doesn’t seem like the network setup is the root cause of this issue.

I also tested on the latest drive release, and I was unable to reproduce the issue. It seems like some change between version 4.1.8 and version 5.0.10 resolves the bug. We will try to update to the newer version at some point, but it seems like it will take a good deal of rework on our current build system and camera pipeline implementation to bring it up to date.

Can you confirm that there is no way to trigger the same behavior as the “aurixreset” aurix console command without a USB debugging cable attached? This command consistently recovers the board from the frozen state. If there is another way to trigger it that would help us bridge the gap between now and when we are able to update to a newer version.

Dear @jchadwell,
You can connect to aurix shell only via usb debugging cable(like using minicom) to run aurixreset. As you said, it could be an issue on older SW version and got fixed in subsequent releases.