Orinnx reboot repeatly but it switch to B unexpectly

Hi,
We just received the developer kit. Will set up and try to reproduce the issue.

One quick question, can we re-flash it? Or have to use the re-flashed system?

Hi,

The devkit is already pre-flashed with a full repro that logs events and highlight if the boot swap has happened. I recommend to use UART for collecting logs for the hang use case.

You should have received a pdf file that explain how to check the logs, as well as full instructions if you want to reflash it (with ROOTFS_AB=1 enabled). Can you confirm you have access to this pdf file?

What you can do in the first steps is to let the system run for a couple of days (it will reboot automatically) and then check the logs events to confirm the boot swap.

As this is an unfused orin nx module, you can later on reflash only UEFI and ATF which are the key elements for investigation and possible fixes, as this will keep the auto reboot and logs events active.

If you need further details, let us know.

1 Like

Hi,

Our Nvidia representative informed us that investigations were started on your end. Was the description on how to check the issue in the provided document clear enough or do you need any further explanations?

Thanks

1 Like

Hi,
We can reproduce the error with default setup:

ÿÀUnhandled Exception in EL3.
x30            = 0x0000000050000d00
x0             = 0x0000000000000000
x1             = 0x00000000be000011
x2             = 0x0000000000000000
x3             = 0x0000000000000011
x4             = 0x0000000000100000
x5             = 0x000000046e9fda48
x6             = 0x0000000401000000
x7             = 0x0000000401000000
x8             = 0x0000000000000000
x9             = 0x000000005001c380
x10            = 0x55aaa055071dbd35
x11            = 0x55aa8255ce1abfe1
x12            = 0x0000000000000000
x13            = 0x000000000002700f
x14            = 0x0000000000000006
x15            = 0x0000000000000002
x16            = 0x000000046a73cdac
x17            = 0x00000000467c7f3d
x18            = 0x00000004687bb2f0
x19            = 0x000000005001cec0
x20            = 0x0000000000000000
x21            = 0x0000000000000000
x22            = 0x0000000000000000
x23            = 0x0000000000000000
x24            = 0x0000000000000000
x25            = 0x0000000000000000
x26            = 0x0000000000000000
x27            = 0x0000000000000000
x28            = 0x0000000000000000
x29            = 0x0000000000000000
scr_el3        = 0x000000000003073d
sctlr_el3      = 0x00000000b0cd183f
cptr_el3       = 0x0000000000000000
tcr_el3        = 0x0000000080823518
daif           = 0x00000000000002c0
mair_el3       = 0x00000000004404ff
spsr_el3       = 0x00000000600003c9
elr_el3        = 0x00000004687b5280
ttbr0_el3      = 0x0000000050026ac1
esr_el3        = 0x00000000be000011
far_el3        = 0x0000000000000000
spsr_el1       = 0x0000000000000000
elr_el1        = 0x0000000000000000
spsr_abt       = 0x0000000000000000
spsr_und       = 0x0000000000000000
spsr_irq       = 0x0000000000000000
spsr_fiq       = 0x0000000000000000
sctlr_el1      = 0x0000000030d00800
actlr_el1      = 0x0000000000000000
cpacr_el1      = 0x0000000000300000
csselr_el1     = 0x0000000000000004
sp_el1         = 0x0000000000000000
esr_el1        = 0x0000000000000000
ttbr0_el1      = 0x0000000000000000
ttbr1_el1      = 0x0000000000000000
mair_el1       = 0x0000000000000000
amair_el1      = 0x0000000000000000
tcr_el1        = 0x0000000000000000
tpidr_el1      = 0x0000000000000000
tpidr_el0      = 0x0000000080000000
tpidrro_el0    = 0x0000000000000000
par_el1        = 0x0000000000000800
mpidr_el1      = 0x0000000081000000
afsr0_el1      = 0x0000000000000000
afsr1_el1      = 0x0000000000000000
contextidr_el1 = 0x0000000000000000
vbar_el1       = 0x0000000000000000
cntp_ctl_el0   = 0x0000000000000005
cntp_cval_el0  = 0x000000001fa01bf7
cntv_ctl_el0   = 0x0000000000000000
cntv_cval_el0  = 0x0000000000000000
cntkctl_el1    = 0x0000000000000000
sp_el0         = 0x00000004687bb2f0
isr_el1        = 0x0000000000000040
cpuectlr_el1   = 0xa000000b40543000
gicd_ispendr regs (Offsets 0x200 - 0x278)
 Offset:			value
0000000000000200:		0x0000000000000000
0000000000000204:		0x0000000000000000
0000000000000208:		0x0000000000000000
000000000000020c:		0x0000000000000000
0000000000000210:		0x0000000000000000
0000000000000214:		0x0000000000000000
0000000000000218:		0x0000000000010000
000000000000021c:		0x0000000000020000
0000000000000220:		0x0000000000000000
0000000000000224:		0x0000000000000000
0000000000000228:		0x0000000000000000
000000000000022c:		0x0000000000000000
0000000000000230:		0x0000000000000000
0000000000000234:		0x0000000000000000
0000000000000238:		0x0000000000000000
000000000000023c:		0x0000000000000000
0000000000000240:		0x0000000000000000
0000000000000244:		0x0000000000000000
0000000000000248:		0x0000000000000000
000000000000024c:		0x0000000000000000
0000000000000250:		0x0000000000000000
0000000000000254:		0x0000000000000000
0000000000000258:		0x0000000000000000
000000000000025c:		0x0000000000000000
0000000000000260:		0x0000000000000000
0000000000000264:		0x0000000000000000
0000000000000268:		0x0000000000000000
000000000000026c:		0x0000000000000000
0000000000000270:		0x0000000000000000
0000000000000274:		0x0000000000000000
0000000000000278:		0x0000000000000000
000000000000027c:		0x0000000000000000

Will need to re-flash the device and debug further.

Hi Dane,

Thanks for the feedback. I am glad that you were able to reproduce the issue on your end. We hope you will be able to find the root causes. Currently we have more than 200 orin nx modules that have this feature in quarantine, so being able to make use of them with an hot-fix anytime soon would be great.

Thanks

1 Like

Hi Dane,

Did you witness this issue after reflashing the devkit?

Any preliminary result can be shared?

Thanks

Hi,
We have done some tests and would like to share the result. We flash the developer kit to Jetpack 6.2.2 r36.5 and still observe the issue. It fails in UEFI accessing NVMe SSD, same as r35.6.2. We then downgrade PCIe C4 to gen1(gen4 by default) and can pass 1200-run reboots.