The AGX Xavier keeps rebooting automatically and then goes into recovery mode

The device has been working fine, after a particular power up. The device keeps restarting automatically, each time on the following print:

[    1.879915] gpiochip0: registered GPIOs 504 to 511 on max77620-gpio
[    2.018647] max77686-rtc max77620-rtc: registered as rtc0
[    2.019540] max77620-power max20024-power: Event recorder REG_NVERC : 0x0
[    2.019827] max77620 4-003c: max77620 probe successful
debugfs initialized
[    2.854499] tegra-se-elp 3ad0000.se_elp: tegra_se_elp_probe: complete
[    2.855547] hid: raw HID events driver (C) Jiri Kosina
[    2.856238] usbcore: registered new interface driver usbhid
[    2.856379] usbhid: USB HID core driver
[    2.858548] tegra186-cam-rtcpu bc00000.rtcpu: Adding to iommu group 4
[    2.859596] tegra186-cam-rtcpu bc00000.rtcpu: Trace buffer configured at IOVA=0xbff00000
[    22.503665] Camera-FW on t194-rce-safe started
TCU early console enabled.
[    22.578454] Camera-FW on t194-rce-safe ready SHA1=9e9c1f28 (crt 0.775 ms, total boot 75.593 ms)
[    2.945178] tegra-ivc-bus bc00000.rtcpu:ivc-bus: region 0: iova=0xbfec0000-0xbfee01ff size=131584
[    2.945783] tegra-ivc-bus bc00000.rtcpu:ivc-bus:echo@0: echo: ver=0 grp=1 RX[16x64]=0x1000-0x1480 TX[16x64]=0x1480-0x1900
[    2.947027] tegra-ivc-bus bc00000.rtcpu:ivc-bus:dbg@1: dbg: ver=0 grp=1 RX[1x448]=0x1900-0x1b40 TX[1x448]=0x1b40-0x1d80
[    2.948285] tegra-ivc-bus bc00000.rtcpu:ivc-bus:dbg@2: dbg: ver=0 grp=1 RX[1x8192]=0x1d80-0x3e00 TX[1x8192]=0x3e00-0x5
[0000.060] W> RATCHET: MB1 binary ratchet value 4 is larger than ratchet level 2 from HW fuses.
[0000.068] I> MB1 (prd-version: 2.6.0.0-t194-41334769-cab45716)
[0000.073] I> Boot-mode: Coldboot
[0000.076] I> Platform: Silicon

It seems have a problem with tegra-ivc-bus? Auto reboot after tegra-ivc-bus prints every time.

After several reboot, the device comes into Recovery Mode:

Jetson UEFI firmware (version 6.0-37391689 built on 2024-08-28T08:47:11+00:00)
ESC   to enter Setup.
F11   to enter Boot Manager Menu.
Enter to continue boot.
**  WARNING: Test Key is used.  **
......ASSERT [VariableRuntimeDxe] /out/nvidia/bootloader/uefi/Jetson_RELEASE/edk2/MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c(3264): !(((INTN)(RETURN_STATUS)(Status)) < 0)

L4TLauncher: Attempting Recovery Boot
EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services and installing virtual address map...
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x4e0f0040]

What could be the reason for the constant rebooting of the above DEVICE?

The following is a detailed serial printout of how this phenomenon occurs:
[com COM4] (2025-02-26_113008) COM4 (Prolific PL2303GT USB Serial COM Port (COM4))(1) (1).log (12.3 MB)

Please help me!

Hi,

Please try to share the full log before recovery boot happened.

Recovery boot will happen if your system fails to boot up multiple times. Thus, we need to check why it keeps rebooting but not recovery boot itself.

Hello, please see the attachment above for detailed serial port printing. It records all the prints that go into this exception.

Hi,

我直接用中文說明可能比較清楚. 我們的意思是請你提供"recovery boot" 發生之前前幾次開機失敗完整的log.

你附上的東西已經在recovery boot了 這份log全部人給的都長一樣. 因為recovery boot是我們提供的一個範本initrd. 沒有確認的必要.
你的log裡面有353次開機的紀錄也沒有幫助, 因為這些全部都是recovery boot.


The log you provided does not help. They are all recovery boot log.

We need you to provide the full log “before” recovery boot happened. Recovery boot image is a template initrd provided by us. Everyone’s recovery image is same so no need to check…

非常抱歉,在进入recovery Boot前我们没有接上串口。
我们会一直接上串口,尝试复现这个问题,然后将 “recovery boot” 發生之前前幾次 開機失敗完整的log 提供给您。