Unhandled Exception in EL3 in UEFI when bootup

Hi

A device encountered an abnormal crash during the UEFI phase upon reboot, causing the system to hang and requiring a power cycle to recover. What could be the potential causes? Is there a watchdog monitoring the startup process during the UEFI phase? Uploading logs.

I’m using custom board for AGX Orin. And the version of jetson linux is R36.3.

upload the log
boot_err_NX.txt (21.1 KB)


  ▒▒[     5.773990] Camera-FW on t234-rce-safe started
TCU early console enabled.
▒▒

  ▒▒E/TC:?? 00 get_rpc_alloc_res:645 RPC allocation failed. Non-secure worl▒▒[     5.821546] Camera-FW on t234-rce-safe ready SHA1=e2238c99 (crt 0.911 ms, total boot 48.517 ms)
▒▒d result: ret=0xffff0000 ret_origin=0
E/LD:   init_elf:486 sys_open_ta_bin(bc50d971-d4c9-42c4-82cb-343fb7f37896)
E/TC:?? 00 ldelf_init_with_ldelf:131 ldelf failed with res: 0xffff000c
▒▒


▒▒Unhandled Exception in EL3.
x30            = 0x0000000050000d1c
x0             = 0x0000000000000000
x1             = 0x00000000be000011
x2             = 0x0000000000000011
x3             = 0x0000000000000000
x4             = 0x0000000000100000
x5             = 0x00000004723fe5a8
x6             = 0x0000000001000000
x7             = 0x0000000001000000
x8             = 0x00000000000000ff
x9             = 0x000000005001ae40
x10            = 0x55aaa055071756f5
x11            = 0x55aa8255bbfabfb1
x12            = 0x0a0341d0000c0102
x13            = 0x0004ff7f00000000
x14            = 0x000000046b02b260
x15            = 0x000000046b02b310
x16            = 0x000000046b8bacf8
x17            = 0x0000000000000067
x18            = 0x000000046b8c6310
x19            = 0x000000005001b980
x20            = 0x0000000000000000
x21            = 0x0000000000000000
x22            = 0x0000000000000000
x23            = 0x0000000000000000
x24            = 0x0000000000000000
x25            = 0x0000000000000000
x26            = 0x0000000000000000
x27            = 0x0000000000000000
x28            = 0x0000000000000000
x29            = 0x0000000000000000
scr_el3        = 0x000000000003073d
sctlr_el3      = 0x00000000b0cd183f
cptr_el3       = 0x0000000000000000
tcr_el3        = 0x0000000080823518
daif           = 0x00000000000002c0
mair_el3       = 0x00000000004404ff
spsr_el3       = 0x00000000600003c9
elr_el3        = 0x000000046b8c0a80
ttbr0_el3      = 0x0000000050025581
esr_el3        = 0x00000000be000011
far_el3        = 0x0000000000000000
spsr_el1       = 0x0000000000000000
elr_el1        = 0x0000000000000000
spsr_abt       = 0x0000000000000000
spsr_und       = 0x0000000000000000
spsr_irq       = 0x0000000000000000
spsr_fiq       = 0x0000000000000000
sctlr_el1      = 0x0000000030d00800
actlr_el1      = 0x0000000000000000
cpacr_el1      = 0x0000000000300000
csselr_el1     = 0x0000000000000004
sp_el1         = 0x0000000000000000
esr_el1        = 0x0000000000000000
ttbr0_el1      = 0x0000000000000000
ttbr1_el1      = 0x0000000000000000
mair_el1       = 0x0000000000000000
amair_el1      = 0x0000000000000000
tcr_el1        = 0x0000000000000000
tpidr_el1      = 0x0000000000000000
tpidr_el0      = 0x00000000c2000000
tpidrro_el0    = 0x0000000000000000
par_el1        = 0x0000000000000800
mpidr_el1      = 0x0000000081000000
afsr0_el1      = 0x0000000000000000
afsr1_el1      = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1       = 0x0000000000000000
cntp_ctl_el0   = 0x0000000000000005
cntp_cval_el0  = 0x000000000caefb6a
cntv_ctl_el0   = 0x0000000000000000
cntv_cval_el0  = 0x0000000000000000
cntkctl_el1    = 0x0000000000000000
sp_el0         = 0x000000046b8c6310
isr_el1        = 0x0000000000000040
cpuectlr_el1   = 0xa000000b40543000
gicd_ispendr regs (Offsets 0x200 - 0x278)
 Offset:                        value
0000000000000200:               0x0000000000000000
0000000000000204:               0x0000000000000000
0000000000000208:               0x0000000000000000
000000000000020c:               0x0000000000000000
0000000000000210:               0x0000000000000000
0000000000000214:               0x0000000000000000
0000000000000218:               0x0000000000010000
000000000000021c:               0x0000000000020000
0000000000000220:               0x0000000000000000
0000000000000224:               0x0000000000000000
0000000000000228:               0x0000000000000000
000000000000022c:               0x0000000000000000
0000000000000230:               0x0000000000000000
0000000000000234:               0x0000000000000000
0000000000000238:               0x0000000000000000
000000000000023c:               0x0000000000000000
0000000000000240:               0x0000000000000000
0000000000000244:               0x0000000000000000
0000000000000248:               0x0000000000000000
000000000000024c:               0x0000000000000000
0000000000000250:               0x0000000000000000
0000000000000254:               0x0000000000000000
0000000000000258:               0x0000000000000000
000000000000025c:               0x0000000000000000
0000000000000260:               0x0000000000000000
0000000000000264:               0x0000000000000000
0000000000000268:               0x0000000000000000
000000000000026c:               0x0000000000000000
0000000000000270:               0x0000000000000000
0000000000000274:               0x0000000000000000
0000000000000278:               0x0000000000000000
▒▒0000000000027c:               0x0000000000000000

Hi hanyang369,

Can it boot before?
If so, what have you done before hitting this issue?
Is the issue specific to current device?

I would suggest using debug UEFI binary to capture detailed logs.

Hi KevinFFF,

Can it boot before?

Yes, it can bootup before. And after hit this issue, it returned to normal after being powered on again.

If so, what have you done before hitting this issue?

Nothing specific, just a normal reboot. But the temperature of the device is high.

Is the issue specific to current device?

For now it just happened once on this device.

I would suggest using debug UEFI binary to capture detailed logs.

Could you provide the steps or any references I can refer to?

Thanks

Okay, so it can be recovered through a reboot.
Do you know how to reproduce the issue? And how about the failed rate?

Will you hit the issue if temperature is normal?

Please refer to Build with docker · NVIDIA/edk2-nvidia Wiki · GitHub for details.