Hello,
I encountered a strange issue with my Jetson AGX Orin recently and am looking for insights into the root cause.
Environment:
-
Device: Jetson AGX Orin
-
JetPack Version: Jetpack6.2.1
-
Storage: eMMC + NVMe SSD (System installed on NVMe SSD)
Issue Description: The device was working normally. However, after installing a WiFi module into the M.2 Key E slot and rebooting, the system failed to boot into Linux. Instead, it defaulted to a “Recovery Boot” state, dropping me into a bash-5.1# shell environment.
Troubleshooting Steps & Observations:
-
Check Boot Mode: I checked the
L4TDefaultBootModevariable. The value appeared to be07 00 00 00 00, which technically shouldn’t force a recovery mode, yet the system behaviors suggested otherwise (L4TLauncher was attempting recovery boot). -
UEFI Selection: Manually selecting the kernel/OS from the UEFI Boot Manager still resulted in the system entering the recovery shell.
-
Config Check: In the
bash-5.1#shell, I verified/boot/extlinux/extlinux.conf, and the configuration appeared correct. -
Exiting Recovery Loop: I suspected the system was stuck in an OTA recovery state. I successfully switched the system back to “Normal Boot” mode by deleting the OTA-related UEFI variables using the following commands:
chattr -i /sys/firmware/efi/efivars/L4TDefaultBootMode-781e084c-a330-417c-b678-38e696380cb9 rm /sys/firmware/efi/efivars/L4TDefaultBootMode-781e084c-a330-417c-b678-38e696380cb9 chattr -i /sys/firmware/efi/efivars/RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 rm /sys/firmware/efi/efivars/RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 chattr -i /sys/firmware/efi/efivars/RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9 rm /sys/firmware/efi/efivars/RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9 -
Mount Failure: After forcing the system back to Normal Mode via the steps above, it attempted to boot but failed with a Partition Mount Error. The logs showed USB enumeration errors and an inability to mount the rootfs PARTUUID:
[ 4.543325] Root device found: PARTUUID=02b3c5cf-6cfd-4176-9308-1696f63fd074 ... [ 4.672859] usb 1-2: device descriptor read/64, error -71 ... [ 6.792659] usb usb1-port2: unable to enumerate USB device [ 27.926053] ERROR: mounting PARTUUID=02b3c5cf-6cfd-4176-9308-1696f63fd074 as /mnt fail... [ 27.927860] ERROR: PARTUUID=02b3c5cf-6cfd-4176-9308-1696f63fd074 mount fail... [ 27.929735] ttyTCU0: Press [ENTER] to start bash in 30 seconds... -
Final Solution: I connected the NVMe SSD to a host PC and replaced the
initrdfile in/boot/with a freshly compiled version.-
Observation: The corrupted (?)
initrdon the disk was approximately 0.2MB smaller than the freshly compiled one. TheImageand DTB files were identical. -
Result: After replacing the
initrd, the system booted successfully.
-
My Question: It seems the initrd file was corrupted or truncated, which likely caused the initial boot failure, triggering the L4T fallback mechanism to set the Recovery Mode flag.
Does installing a PCIe/USB device (like the M.2 WiFi module) trigger any specific hardware change detection script during boot that might write to the initrd or partition? I am trying to understand how a simple hardware insertion could lead to initrd corruption and this specific “soft brick” state.
Any reply will be appreciated!