The system is stuck in Recovery mode

Hello,

I encountered a strange issue with my Jetson AGX Orin recently and am looking for insights into the root cause.

Environment:

  • Device: Jetson AGX Orin

  • JetPack Version: Jetpack6.2.1

  • Storage: eMMC + NVMe SSD (System installed on NVMe SSD)

Issue Description: The device was working normally. However, after installing a WiFi module into the M.2 Key E slot and rebooting, the system failed to boot into Linux. Instead, it defaulted to a “Recovery Boot” state, dropping me into a bash-5.1# shell environment.

Troubleshooting Steps & Observations:

  1. Check Boot Mode: I checked the L4TDefaultBootMode variable. The value appeared to be 07 00 00 00 00, which technically shouldn’t force a recovery mode, yet the system behaviors suggested otherwise (L4TLauncher was attempting recovery boot).

  2. UEFI Selection: Manually selecting the kernel/OS from the UEFI Boot Manager still resulted in the system entering the recovery shell.

  3. Config Check: In the bash-5.1# shell, I verified /boot/extlinux/extlinux.conf, and the configuration appeared correct.

  4. Exiting Recovery Loop: I suspected the system was stuck in an OTA recovery state. I successfully switched the system back to “Normal Boot” mode by deleting the OTA-related UEFI variables using the following commands:

    chattr -i /sys/firmware/efi/efivars/L4TDefaultBootMode-781e084c-a330-417c-b678-38e696380cb9
    rm /sys/firmware/efi/efivars/L4TDefaultBootMode-781e084c-a330-417c-b678-38e696380cb9
    
    chattr -i /sys/firmware/efi/efivars/RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
    rm /sys/firmware/efi/efivars/RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
    chattr -i /sys/firmware/efi/efivars/RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9
    rm /sys/firmware/efi/efivars/RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9
    
    
  5. Mount Failure: After forcing the system back to Normal Mode via the steps above, it attempted to boot but failed with a Partition Mount Error. The logs showed USB enumeration errors and an inability to mount the rootfs PARTUUID:

    [    4.543325] Root device found: PARTUUID=02b3c5cf-6cfd-4176-9308-1696f63fd074
    ...
    [    4.672859] usb 1-2: device descriptor read/64, error -71
    ...
    [    6.792659] usb usb1-port2: unable to enumerate USB device
    [   27.926053] ERROR: mounting PARTUUID=02b3c5cf-6cfd-4176-9308-1696f63fd074 as /mnt fail...
    [   27.927860] ERROR: PARTUUID=02b3c5cf-6cfd-4176-9308-1696f63fd074 mount fail...
    [   27.929735] ttyTCU0: Press [ENTER] to start bash in 30 seconds...
    
    
  6. Final Solution: I connected the NVMe SSD to a host PC and replaced the initrd file in /boot/ with a freshly compiled version.

    • Observation: The corrupted (?) initrd on the disk was approximately 0.2MB smaller than the freshly compiled one. The Image and DTB files were identical.

    • Result: After replacing the initrd, the system booted successfully.

My Question: It seems the initrd file was corrupted or truncated, which likely caused the initial boot failure, triggering the L4T fallback mechanism to set the Recovery Mode flag.

Does installing a PCIe/USB device (like the M.2 WiFi module) trigger any specific hardware change detection script during boot that might write to the initrd or partition? I am trying to understand how a simple hardware insertion could lead to initrd corruption and this specific “soft brick” state.

Any reply will be appreciated!

hello 2759897880,

let me have confirmation, you’ve external storage NVMe SSD connected via M.2 Key E slot.
you’ve removing that NVMe SSD, by replacing with a WiFi module and reproduce this failure?

Hello,

Thank you very much for your reply.

The M.2 Key E slot is specifically designed for connecting Wi-Fi modules and no other peripherals have been connected to it before, And it is operated with the power off. Additionally, I just received a message that this issue occurs not only when a Wi-Fi module is connected to the M.2 Key E interface, but also occasionally when no new peripherals are connected.

Could it be that some user operation has mistakenly entered the Recovery mode or damaged the initrd file?

hello 2759897880,

may I also confirm the wifi module you’re working with?
please refer to Topic 297967 for some some tweaks for confirmation.

Hello, I am using the Intel 8265ngw WiFi module. I will try the method you provided. Thank you very much

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.