7/28/2023
NVIDIA Team,
We are having an intermittent booting issue with a Jetson AGX Xavier attached to a Connect Tech Rogue Carrier Board.
The issue is as follows:
Generally speaking, for days or weeks at a time we use our Jetson AGX Xavier + Connect Tech Rogue Carrier Board systems without fail. During this time, we install typical items (python packages, other benign software, etc.) but we do not modify the kernel or boot process in any way. We typically go through hundreds of power cycles and power the boards with multiple different power systems (different types of batteries, power supplies, etc.) none of which produce any problems.
However, after some time of using the systems (~1-2 weeks), upon turning the system on the system does not boot. At this point we can cycle power repeatedly but are not able to get the system booting again. We are able to connect to the serial console via Minicom, which produces the following log (minicom_fail.cap):
minicom_fail.cap (10.5 KB)
After the point in the logs where we see multiple blank lines in a row, the system hangs and we are never able to make it past this point. We are not able to access the booting options (i.e. Esc or F11 during boot) as our issue occurs before that point in the booting process.
We are, however, able to place the system into Force Recovery Mode and flash it, which works as intended and allows us to continue our work. After doing this we clone all of our code, install everything we need, operate the system without issue for 1-2 weeks before we eventually run into the same issue described above.
Some things of note:
- We have many different Jetson AGX Xaviers and many different Connect Tech Rogue Carrier boards, this same issue has happened on all of them at some point
- After obtaining a system in the broken state described above, we have removed the Rogue Carrier board from the Jetson AGX Xavier and swapped the modules with a known working combination at which point we see the issue “following” the Jetson
- Ex.
- Setup 1: Jetson AGX Xavier 1 + Rogue 1 = Broken
- Setup 2: Jetson AGX Xavier 2 + Rogue 2 = Working as intended
- Setup 3: Jetson AGX Xavier 1 + Rogue 2 = Broken
- Setup 4: Jetson AGX Xavier 2 + Rogue 1 = Working as intended
- Here is a serial log from the (working) boot process with this specific combo of items used (minicom_combo.cap):
minicom_combo.cap (71.8 KB)
- Here is a serial log from the (working) boot process with this specific combo of items used (minicom_combo.cap):
- We have also seen a similar, but different, issue where everything mentioned above still applies except rather than hanging, the system automatically attempts to reboot itself in a repeated “reboot loop”. The log for the “reboot loop” can be seen here (minicom_fail_old.cap):
minicom_old_fail.cap (877.7 KB)- We do not have this specific issue with any systems right now (we instead have the “hanging” issue), but I just wanted to mention this as it feels related.
Thank you for your help and please let me know if you have any questions!