Orin nx 16G ,jetson 5.1.4, automatically enter recovery mode at startup

automatically enter recovery mode at startup, log as follow:
** WARNING: Test Key is used. **
…DisplayLocateChildGopHandle: failed to enumerate graphics output device handles: Not Found
UpdatePcieControllersWithGpuDevice: failed to enumerate GPU device handles: Not Found
InstallFdt: Installing Kernel DTB
Processing “L4T Configuration Settings” DTB overlay
Deleting fragment fragment@0
Processing “P3767 Overlay Support” DTB overlay
Deleting fragment fragment@0
Deleting fragment fragment@1
Deleting fragment fragment@2
Deleting fragment fragment@3
Processing “P3768 Overlay Support” DTB overlay
Deleting fragment fragment@0
Deleting fragment fragment@1
UpdateRamOopsMemory: RamOopsBase: 0x46EB70000, RamOopsSize: 0x200000
FtpmProtocol Not Found - Not Found
DisplayLocateChildGopHandle: failed to enumerate graphics output device handles: Not Found
UpdatePcieControllersWithGpuDevice: failed to enumerate GPU device handles: Not Found
[Bds]Booting UEFI KINGSTON OM8PGP41024Q-A0 50026B738300C208
add-symbol-file /out/nvidia/bootloader/uefi/Jetson_RELEASE/Build/Jetson/RELEASE_GCC5/AARCH64/Silicon/NVIDIA/Application/L4TLauncher/L4TLauncher/DEBUG/L4TLauncher.dll 0x45B43C000
Loading driver at 0x0045B43B000 EntryPoint=0x0045B4478AC L4TLauncher.efi

L4TLauncher: Attempting Recovery Boot
Processing “L4T Configuration Settings” DTB overlay

In recovery mode, i dump sime efivar:
xxd L4TDefaultBootMode-781e084c-a330-417c-b678-38e696380cb9
00000000: 0700 0000 0100 0000 …
xxd RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
00000000: 0700 0000 FF00 0000 …

change RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 form 0700 0000 FF00 0000 to 0700 0000 0100 0000, start back to normal

My question is what conditions can cause this phenomenon and how can I avoid it?

Hi,

Could you provide the full serial console log for us to review?

Thanks

thanks for your reply, here is the full serial console log:
minicomboot.txt (155.5 KB)

i try to read the uefi code:

717 switch (mRootfsInfo.RootfsVar[RF_REDUNDANCY].Value) {
718 case NVIDIA_OS_REDUNDANCY_BOOT_ONLY:
719 ErrorPrint (L"ValidateRootfsStatus TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT 6 \r\n");
720 // There is no rootfs B. Ensure to set rootfs slot to A.
721 if (mRootfsInfo.CurrentSlot != ROOTFS_SLOT_A) {
722 mRootfsInfo.CurrentSlot = ROOTFS_SLOT_A;
723 }
724
725 // If current slot is bootable, decrease slot RetryCount by 1 and go on boot;
726 // if current slot is unbootable, set slot status as unbootable and boot to recovery kernel.
727 if (IsRootfsSlotBootable (mRootfsInfo.CurrentSlot)) {
728 Status = DecreaseRootfsRetryCount(mRootfsInfo.CurrentSlot);

each time the machine starts, “RootfsRetryCount” will decrease by one, my question is where to restored “RootfsRetryCount”?

my system does not enable A/B rootfs, after flashing the machne , SLOT B is in used,
why is slot A checked here?

I found PcdRootfsRegisterBaseAddressT234 is 0x0C3903A8, It belong to SCRATCH (0x0c390000 0x0c39ffff).
when to clearup this Register?

hi @DavidDDD ,do you have any info that can help me?

Hi,

Sorry for the late update.

Please try below patch:
You could add this config in /opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.service

 After=nv.service
 After=nvgetty.service
 After=l4t-rootfs-validation-config.service
+Before=shutdown.target reboot.target halt.target

And test whether the issue exist

Thanks

hi @DavidDDD ,
在系统服务中更新SCRATCH MEM计数不是一个好的选择。之前测试过,在rc.loacl中执行reboot,很容易复现这个问题。我这里没有使用AB分区,即使标记了rootfs无法使用,也没有更好策略进行下一步。我可能会把这个更新动作添加到内核模块中,或者UEFI中禁止这个功能。感谢!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.