AGX Orin boot failed

We did a startup stress test on AGX Orin
Our test has a 10% chance of failing(300 times).
The error goes like this

��NOTICE:  BL31: v2.6(release):346877e39
NOTICE:  BL31: Built : 12:32:40, Aug  1 2023
I/TC: Physical secure memory base 0x83c040000 size 0x3fc0000
��DCE: FW Boot Done
��I/TC: 
I/TC: Non-secure external DT found
I/TC: OP-TEE version: 3.21 (gcc version 9.3.0 (Buildroot 2020.08)) #2 Tue Aug  1 19:39:55 UTC 2023 aarch64
I/TC: WARNING: This OP-TEE configuration might be insecure!
I/TC: WARNING: Please check https://optee.readthedocs.io/en/latest/architecture/porting_guidelines.html
I/TC: Primary CPU initializing
I/TC: Test OEM keys are being used. This is insecure for shipping products!
I/TC: Primary CPU switching to normal world boot
��
Jetson UEFI firmware (version 4.1-33958178 built on 2023-08-01T19:34:02+00:00)





























































��I/TC: Reserved shared memory is disabled
I/TC: Dynamic shared memory is enabled
I/TC: Normal World virtualization support is disabled
I/TC: Asynchronous notifications are disabled
��



































��E/TC:?? 00 get_rpc_alloc_res:645 RPC allocation failed. Non-secure world result: ret=0xffff0000 ret_origin=0
E/LD:   init_elf:486 sys_open_ta_bin(bc50d971-d4c9-42c4-82cb-343fb7f37896)
E/TC:?? 00 ldelf_init_with_ldelf:131 ldelf failed with res: 0xffff000c
��












































e[2Je[04De[=3he[2Je[09D











��Unhandled Exception in EL3.
x30            = 0x0000000050000d00
x0             = 0x0000000000000000
x1             = 0x00000000be000011
x2             = 0x0000000000000000
x3             = 0x0000000000000011
x4             = 0x0000000000100000
x5             = 0x000000082d1fe588
x6             = 0x0000000001000000
x7             = 0x0000000001000000
x8             = 0x00180301d3719223
x9             = 0x000000005001c380
x10            = 0x55aaa055071dbd35
x11            = 0x55aa8255ce1abfe1
x12            = 0x0a0341d0000c0102
x13            = 0x0004ff7f00000000
x14            = 0x00000008065bdba8
x15            = 0x00000008065bdb10
x16            = 0x000000082902803c
x17            = 0x00000000307cf10e
x18            = 0x0000000828f3b2f0
x19            = 0x000000005001cec0
x20            = 0x0000000000000000
x21            = 0x0000000000000000
x22            = 0x0000000000000000
x23            = 0x0000000000000000
x24            = 0x0000000000000000
x25            = 0x0000000000000000
x26            = 0x0000000000000000
x27            = 0x0000000000000000
x28            = 0x0000000000000000
x29            = 0x0000000000000000
scr_el3        = 0x000000000003073d
sctlr_el3      = 0x00000000b0cd183f
cptr_el3       = 0x0000000000000000
tcr_el3        = 0x0000000080823518
daif           = 0x00000000000002c0
mair_el3       = 0x00000000004404ff
spsr_el3       = 0x00000000600003c9
elr_el3        = 0x0000000828f35280
ttbr0_el3      = 0x0000000050026ac1
esr_el3        = 0x00000000be000011
far_el3        = 0x0000000000000000
spsr_el1       = 0x0000000000000000
elr_el1        = 0x0000000000000000
spsr_abt       = 0x0000000000000000
spsr_und       = 0x0000000000000000
spsr_irq       = 0x0000000000000000
spsr_fiq       = 0x0000000000000000
sctlr_el1      = 0x0000000030d00800
actlr_el1      = 0x0000000000000000
cpacr_el1      = 0x0000000000300000
csselr_el1     = 0x0000000000000000
sp_el1         = 0x0000000000000000
esr_el1        = 0x0000000000000000
ttbr0_el1      = 0x0000000000000000
ttbr1_el1      = 0x0000000000000000
mair_el1       = 0x0000000000000000
amair_el1      = 0x0000000000000000
tcr_el1        = 0x0000000000000000
tpidr_el1      = 0x0000000000000000
tpidr_el0      = 0x0000000080000000
tpidrro_el0    = 0x0000000000000000
par_el1        = 0x0000000000000800
mpidr_el1      = 0x0000000081000000
afsr0_el1      = 0x0000000000000000
afsr1_el1      = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1       = 0x0000000000000000
cntp_ctl_el0   = 0x0000000000000005
cntp_cval_el0  = 0x0000000020fb139b
cntv_ctl_el0   = 0x0000000000000000
cntv_cval_el0  = 0x0000000000000000
cntkctl_el1    = 0x0000000000000000
sp_el0         = 0x0000000828f3b2f0
isr_el1        = 0x0000000000000040
cpuectlr_el1   = 0xa000000b40543000
gicd_ispendr regs (Offsets 0x200 - 0x278)
 Offset:			value
0000000000000200:		0x0000000000000000
0000000000000204:		0x0000000000000000
0000000000000208:		0x0000000000000000
000000000000020c:		0x0000000000000000
0000000000000210:		0x0000000000000000
0000000000000214:		0x0000000000000000
0000000000000218:		0x0000000000010000
000000000000021c:		0x0000000000020000
0000000000000220:		0x0000000000000000
0000000000000224:		0x0000000000000000
0000000000000228:		0x0000000000000000
000000000000022c:		0x0000000000000000
0000000000000230:		0x0000000000000000
0000000000000234:		0x0000000000000000
0000000000000238:		0x0000000000000000
000000000000023c:		0x0000000000000000
0000000000000240:		0x0000000000000000
0000000000000244:		0x0000000000000000
0000000000000248:		0x0000000000000000
000000000000024c:		0x0000000000000000
0000000000000250:		0x0000000000000000
0000000000000254:		0x0000000000000000
0000000000000258:		0x0000000000000000
000000000000025c:		0x0000000000000000
0000000000000260:		0x0000000000000000
0000000000000264:		0x0000000000000000
0000000000000268:		0x0000000000000000
000000000000026c:		0x0000000000000000
0000000000000270:		0x0000000000000000
0000000000000274:		0x0000000000000000
0000000000000278:		0x0000000000000000
000000000000027c:		0x0000000000000000

The full log is
orin-m-5s-stress-err1.log (34.0 KB)
Is this a known problem?
How can we fix it?

Hi,

Is it a DevKit or a custom carrier board?
What L4T version do you use?

custom carrier board
JetPack 5.1.2

Please see if it’s replicable on DevKit.
Also, please re-build UEFI to enable debug log:

Compile with the latest version or
Matches JetPack 5.1.2

Use the r35.4.1-updates branch.

1 Like

After compiling uefi firmware, the problem of boot failure occurred again
the fail log
orin_error_5s.log (117.7 KB)
successful log
orin_success_5s.log (195.3 KB)

Do you have any pcie device enabled on your custom board?

Our board has PCIE between master and slave, but it is not enabled
There is also a PCIE M.2 HDD, but it is also device-free
There are no other PCIE devices

Are you booting from emmc or nvme now? Could you disable pcie in UEFI device tree first and see if it can bypass the issue?

It’s use emmc
ok,I will test.
Now I replace the device tree by replacing files under /boot/dtb/,
Is such a test okay?
Will this affect the UEFI boot loading device?

Since you are booting from emmc, that does not matter for now.

Please just try it.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.