Hello,
I have been using an Xavier AGX 32G with a Connect Tech Rogue carrier card for going on a year with no issues. The system has been very stable and not exhibited any problems either booting or at run time. I upgraded to the latest Jetpack 5.0.2 Ubuntu 20.04 based OS about a month ago and have not experienced any problems. The system seems stable.
Xavier AGX 32G
Rogue AGX-101
Jetpack 5.0.2
ConnectTech JetPack 5.0.2 - L4T r35.1.0 BSP
I do regular apt update and apt upgrades, as well as reboots as part of my typical usage, again with no issues. Yesterday I did an apt update/upgrade followed by a reboot and the AGX did not boot again;I tried several power cycles with no success. After several more attempts I connected the debug UART and can see the boot loader attempting to boot the AGX, but failing with what I think is an ASSERT.
Note
I do have some serial devices connected to the 2 ttyTHS* uarts, but have also disabled the getty in the OS.\
crw-rw---- 1 root dialout 238, 0 Apr 15 18:25 /dev/ttyTHS0
crw-rw---- 1 root dialout 238, 1 Apr 15 18:25 /dev/ttyTHS1
I also removed the serial devices but the AGX still remains stuck in the boot loop.
I have some typical software installed on the AGX, CUDA etc, nothing out of the ordinary and no boot loader or kernel modifications. It’s actually a pretty stock configuration beyond what Connect Tech needs to do to get their BSP in place. Nothing has changed over the last month beyond the fresh Jetpack 5.0.2 install. If I power cycle the AGX enough times it can boot, I’d say maybe 1 out of 40 attempts it boots, the rest of the attempts it gets stuck it the boot loader loop. Once I either do a software reboot, or kill power, it is the same no boot behavior. The boot loader goes into a loop and tries for some number of times then gives up. I attached a keyboard to try and get into the boot loader menu but the AGX was not responsive to the keys at that point.
Has anyone else experienced this behavior on an AGX? I do not see any correlation between recent reboots or apt upgrades, but maybe I am missing something. I’ve ensured that the flash eMMC that contains the OS is not full, it has around 25% free space on it.
└─ $ ▶ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mmcblk0p1 28G 21G 5.2G 81% /
none 16G 0 16G 0% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 3.1G 18M 3.1G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/nvme1n1p1 916G 2.8G 867G 1% /mnt/data2
/dev/nvme0n1p1 916G 210G 661G 25% /mnt/data
tmpfs 3.1G 16K 3.1G 1% /run/user/124
tmpfs 3.1G 8.0K 3.1G 1% /run/user/1000
For all intents the AGX just seems to have gotten into a bad state or something. But the strange thing is that it appears to be able to boot around once or so times in about 40 or 50 attempts.
Any input would be appreciated. Re-flashing the AGX is always an option, but not ideal. I have a lot of time invested in this image and would be concerned that the AGX will get into this same state again in the future. Since this is part of an autonomous system, that is not a good option.
Please see attached log.
thank you
20221214_agx_rogue_111_failure_to_boot.log (265.6 KB)