Jetson agx orin启动失败,报错Launcher: Attempting Recovery Boot

一台运行很久的设备突然无法正常启动,通过串口捕获到日志如下:


启动过程报错“is_user_key_exists:65 TEE_InvokeTACommand failed with res = 0xffff0001
TLauncher: Attempting Recovery Boot”进入了OTA模式。可以进入bash-5.0控制台。
新建会话 (3)_2025-03-06_17-20-35.log (174.5 KB)
目前设备的uefi界面是关闭状态,无法通过进入uefi界面操作。请问这是什么原因?硬件是否损坏?

Hi,
If the device cannot be flashed/booted, please refer to the page to get uart log from the device:
Jetson/General debug - eLinux.org
And get logs of host PC and Jetson device for reference. If you are using custom board, you can compare uart log of developer kit and custom board to get more information.
Also please check FAQs:
Jetson AGX Orin FAQ
If possible, we would suggest follow quick start in developer guide to re-flash the system:
Quick Start — NVIDIA Jetson Linux Developer Guide 1 documentation
And see if the issue still persists on a clean-flashed system.
Thanks!

你的截圖糊到沒有辦法看清楚. 如果要提供log的話請你提供完整文字檔. 不要截圖.

1 Like

抱歉请移步查看完整日志
新建会话 (3)_2025-03-06_17-20-35.log (174.5 KB)

Hi 1712127445,

Do you enable disk-encryption on your board? (i.e. ROOTFS_ENC=1?)

Could you reproduce this issue?
Or it just happens suddenly?
If so, have you tried to re-flash the board to check if it could be recovered?

没有开启磁盘加密功能(ROOTFS_ENC=1);
这个设备无法复现,这是正常产品中的一台出现该现象;
目前未重新烧写,担心无法找到原因。请问还有哪些可能性或排查思路?

The current error log shows it is relating to security related feature.
We have to know the reproduce steps to do further debug.

Can a reboot fix the issue? Or it would show the similar error during boot up?

Do you run any application on the board or it may be modified by someone before?

There may be few times boot failed before it enters into this recovery kernel state.
We would like to check the full serial console log for them(i.e. before it enters into recovery kernel)

Do you mean that you’ve disabled the feature to interrupt the boot to enter UEFI menu?

1、重启无法解决这个问题,这台有问题的设备每次重启都是如下日志:

2、出问题前我们不知道发生了什么,因为无法进入系统查看日志。我理解你说的找到复现方法才能解决该问题,但是目前只能进入bash-5.0控制台,请问我们在这里可以做哪些信息收集工作以利于分析产生的原因?
3、我们修改过uefi的代码,关闭了菜单功能,具体修改如下:
— a/Silicon/NVIDIA/Library/PlatformBootManagerLib/PlatformBm.c
+++ b/Silicon/NVIDIA/Library/PlatformBootManagerLib/PlatformBm.c
@@ -1193,7 +1193,7 @@ PlatformBootManagerBeforeConsole (
//
// Register platform-specific boot options and keyboard shortcuts.
//

  • PlatformRegisterOptionsAndKeys ();
  • //PlatformRegisterOptionsAndKeys ();

    //
    // Register EnrollDefaultKeysApp as a SysPrep Option.
    @@ -1546,7 +1546,7 @@ PlatformBootManagerAfterConsole (
    //
    // Display system and hotkey information after console is ready.
    //

  • DisplaySystemAndHotkeyInformation ();
  • //DisplaySystemAndHotkeyInformation ();
    4、在bsh-5.0控制台可以mount rootfs,我在mount后有哪些信息可以收集?

They are necessary for us to analyze what it happens to cause boot failed before entering into recovery kernel.

Okay, I think the state of current slot may be unbootable, but you’ve disabled UEFI menu so that you can not recovery it manually.

Please also refer to the instruction in Set the UEFI Variable in the Recovery Kernel Shell to check if it could help to recover the board.
Or check if you would hit any error during boot up after configuring them.

按照 Set the UEFI Variable in the Recovery Kernel Shell方法,在bash-4中设置 Rootfs A后,成功进入A区(我们是有AB分区的,默认就是在A区,本次异常是进入了恢复模式),如下是成功进入A区的启动日志:
恢复模式改A启动.log (221.0 KB)
查看/var/log下所有日志和history,没有与该异常相关的操作。
请问在rootfs下还有哪些数据可以分析?本次异常每次开机只能进入recovery模式,在外部硬件上是否会有导致的情况?

A区根文件系统正常进入后,马上引发内核报错,报错如下:

Ubuntu 20.04.6 LTS tegra-ubuntu ttyTCU0

tegra-ubuntu login: [   34.229169] Unable to handle kernel paging request at virtual address 0003ad9ada9477a3
[   34.229453] Mem abort info:
[   34.229541]   ESR = 0x96000004
[   34.229656]   EC = 0x25: DABT (current EL), IL = 32 bits
[   34.229843]   SET = 0, FnV = 0
[   34.229936]   EA = 0, S1PTW = 0
[   34.230037] Data abort info:
[   34.230129]   ISV = 0, ISS = 0x00000004
[   34.230246]   CM = 0, WnR = 0
[   34.230351] [0003ad9ada9477a3] address between user and kernel address ranges
[   34.230582] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[   34.230753] Modules linked in: gpio(OE) xt_multiport(E) ip6table_filter(E) ip6_tables(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink(E) iptable_nat(E) nf_nat(E) xt_addrtype(E) br_netfilter(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) xt_tcpudp(E) iptable_filter(E) lzo_rle(E) lzo_compress(E) zram(E) overlay(E) ramoops(E) reed_solomon(E) snd_soc_tegra186_asrc(E) snd_soc_tegra210_iqc(E) snd_soc_tegra210_ope(E) snd_soc_tegra186_dspk(E) snd_soc_tegra210_mvc(E) snd_soc_tegra186_arad(E) snd_soc_tegra210_afc(E) snd_soc_tegra210_adx(E) snd_soc_tegra210_dmic(PCKXT) snd_soc_tegra210_amx(E) snd_soc_tegra210_mixer(E) snd_soc_tegra210_i2s(E) snd_soc_tegra210_admaif(E) snd_soc_tegra_pcm(E) snd_soc_tegra210_sfc(E) aes_ce_blk(E) crypto_simd(E) snd_soc_tegra210_adsp(E) cryptd(E) snd_soc_tegra_machine_driver(E) aes_ce_cipher(E) ucsi_ccg(E) cam_cdi_tsc(E) ghash_ce(E) typec_ucsi(E) nv_hawk_owl(E)
[   34.230893]  snd_soc_tegra_utils(E) sha2_ce(E) binfmt_misc(E) sha256_arm64(E) sha1_ce(E) snd_soc_simple_card_utils(E) snd_soc_spdif_tx(E) max96712(E) typec(E) nct1008(E) userspace_alert(E) nvadsp(E) snd_soc_tegra210_ahub(E) i2c_nvvrs11(E) tegra_bpmp_thermal(E) tegra210_adma(E) snd_hda_codec_hdmi(E) nvidia(OE) snd_hda_tegra(E) snd_soc_rt5640(E) snd_hda_codec(E) snd_soc_rl6231(E) snd_hda_core(E) spi_tegra114(E) ina3221(E) pwm_fan(E) nvgpu(E) nvmap(E) ip_tables(E) x_tables(E)
[   34.296093] Unable to handle kernel paging request at virtual address eca3c4a40502f227
[   34.296329] Unable to handle kernel paging request at virtual address 0038420aaadf8047
[   34.296331] Mem abort info:
[   34.296332]   ESR = 0x96000004
[   34.296334]   EC = 0x25: DABT (current EL), IL = 32 bits
[   34.296334]   SET = 0, FnV = 0
[   34.296335]   EA = 0, S1PTW = 0
[   34.296335] Data abort info:
[   34.296336]   ISV = 0, ISS = 0x00000004
[   34.296337]   CM = 0, WnR = 0
[   34.296338] [0038420aaadf8047] address between user and kernel address ranges
[   34.313925] CPU: 2 PID: 1267 Comm: nxserver.bin Tainted: G           OE     5.10.120-tegra #9
[   34.313928] Hardware name: Unknown Jetson AGX Orin Developer Kit/Jetson AGX Orin Developer Kit, BIOS 202210.3-ce6d1ddd-dirty 04/23/2024
[   34.313932] pstate: 60400009 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[   34.313946] pc : _raw_spin_lock+0x40/0x80
[   34.313950] lr : _raw_spin_lock+0x18/0x80
[   34.313952] sp : ffff80001af63cb0
[   34.313954] x29: ffff80001af63cb0 x28: ffff6df7d0ae1d80 
[   34.322074] Mem abort info:
[   34.330194] x27: 0000000000000000 x26: 4e03ad9ada9477a3 
[   34.330199] x25: 0000000000020000 x24: 0000000000100000 
[   34.330203] x23: 0000000000000000 x22: ffff6df7c1441000 
[   34.330207] x21: 4e03ad9ada947723 x20: 00000000ffffff9c 
[   34.330211] x19: 4e03ad9ada9477a3 x18: 0000000000000000 
[   34.330215] x17: 0000000000000000 x16: 0000000000000000 
[   34.330228] x15: 0000000000000000 
[   34.330604] Unable to handle kernel paging request at virtual address 005f94785c2f750e
[   34.330606] Mem abort info:
[   34.330607]   ESR = 0x96000004
[   34.330610]   EC = 0x25: DABT (current EL), IL = 32 bits
[   34.330611]   SET = 0, FnV = 0
[   34.330612]   EA = 0, S1PTW = 0
[   34.330612] Data abort info:
[   34.330613]   ISV = 0, ISS = 0x00000004
[   34.330614]   CM = 0, WnR = 0
[   34.330614] [005f94785c2f750e] address between user and kernel address ranges
[   34.333091]   ESR = 0x96000004
[   34.336231] x14: 0000000000000000 
[   34.336233] x13: 0000000000000000 x12: 00000000000000e0 
[   34.336236] x11: ffff6df7d0ae1d80 x10: ffff6df7d0ae1d80 
[   34.336239] x9 : 0000000000000000 x8 : fefefefefefefeff 
[   34.336242] x7 : ffffbd3d6f6ad000 x6 : ffff6df7d0ae1d80 
[   34.341752]   EC = 0x25: DABT (current EL), IL = 32 bits
[   34.344892] x5 : 0000ffffffffffff x4 : 7fffffffffffffff 
[   34.344894] x3 : 0000000000000000 x2 : 0000000000000001 
[   34.344896] x1 : 0000000000000000 x0 : 4e03ad9ada9477a3 
[   34.344899] Call trace:
[   34.344902]  _raw_spin_lock+0x40/0x80
[   34.344912]  __alloc_fd+0x3c/0x1e0
[   34.348049]   SET = 0, FnV = 0
[   34.350931]  get_unused_fd_flags+0x34/0x40
[   34.354876]   EA = 0, S1PTW = 0
[   34.358021]  do_sys_openat2+0x18c/0x2b0
[   34.358023]  do_sys_open+0x7c/0xd0
[   34.358026]  __arm64_sys_openat+0x2c/0x40
[   34.358032]  el0_svc_common.constprop.0+0x7c/0x1c0
[   34.365372] Data abort info:
[   34.374030]  do_el0_svc+0x34/0xa0
[   34.374034]  el0_svc+0x1c/0x30
[   34.374035]  el0_sync_handler+0xa8/0xb0
[   34.374039]  el0_sync+0x16c/0x180
[   34.386283]   ISV = 0, ISS = 0x00000004
[   34.392145] Code: 52800001 aa1303e0 52800022 2a0103e3 (88e37e62) 
[   34.392151] ---[ end trace ceeced57b3ab59e2 ]---
[   34.396344]   CM = 0, WnR = 0
[   34.403910] Kernel panic - not syncing: Oops: Fatal exception
[   34.408944] [eca3c4a40502f227] address between user and kernel address ranges
[   34.411746] SMP: stopping secondary CPUs
[   35.440230] SMP: failed to stop secondary CPUs 2,5,7
[   35.440233] Kernel Offset: 0x3d3d5d690000 from 0xffff800010000000
[   35.440233] PHYS_OFFSET: 0xffff920940000000
[   35.440235] CPU features: 0x08040006,4a80aa38
[   35.440236] Memory Limit: none
[   35.584002] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

Do you know what causes this kernel panic issue?
Could it be caused from any of your customization/modification?

It seems this is why it enters into recovery kernel. (since there’s kernel panic and it may trigger reset after 120s, it will enter into recovery kernel after 3 trials.)

确实是该恐慌导致进入系统恢复模式,尝试手动切换A区或B区都会在进入根文件系统后引起相同错误的恐慌,120s后三次进入bash-4。
尝试USB进行烧写,烧写也报失败,报错相同。目前该orin模组属于硬件问题吗?内存是否异常?

There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks
~0408

Do you mean it could not be recovered through re-flash?
If it cannot be flashed, please share the full flash log for further check.

Is the issue specific to current module?
Are you using the devkit or custom board for AGX Orin?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.