Fail to boot in EL3

This post is a continuation of the following topic.
Unhandled Exception in EL3 - Jetson & Embedded Systems / Jetson Orin NX - NVIDIA Developer Forums

I Checked Orin NX 16GB with orin nano developer kit.
Only power and UART serial are connected.
I rebuild the UEFI source code, and dump the log.

I> MB2 finished

篶OTICE:  BL31: v2.6(release):07eea4970
NOTICE:  BL31: Built : 07:55:15, Mar 19 2023
Unhandled Exception in EL3.
x30            = 0x0000000050000d00
x0             = 0x0000000000000000
x1             = 0x00000000be000011
x2             = 0x0000000000000000
x3             = 0x0000000000000011
x4             = 0x0000000080000000

here is a full log.
full_log.txt (33.5 KB)

@WayneWWW, Could you check it? thanks.

The crash point seems not related to UEFI.

Are you sure you need to rebuild UEFI source code to reproduce this issue?

No, rebuilds are not needed to reproduce.

So how to reproduce this issue with devkit?

Only reboot many times.
I created a service that reboots after 30 seconds of sleep.

[reboot_test.service]

[Unit]
Description = reboot_test.service
After=dev-%i.device systemd-user-sessions.service plymouth-quit-wait.service getty-pre.target
After=rc-local.service
Before=getty.target
Conflicts=rescue.service
Before=rescue.service

[Service]
ExecStart = /usr/local/bin/reboot_test.sh
Type = simple

[Install]
WantedBy=getty.target

[reboot_test.sh]

#!/bin/bash

sleep 30
reboot

Is it based on rel-35.3.1?

Yes, it is.
I have made no changes to sample root file system.

How is the reproduce rate of this issue? I mean how many reboot iterations does it need to hit this?

1/5000 or even less.

Hi WayneWWW,

Just wanted to let you know that S.Harumoto is not the only one with this problem. Recently, we had the same issue in our system:Orin NX 16GB Module + custom carrier board + L4T 35.3.1.

Here’s how we test it:Power on → Enter the OS and the wireless network is connected → Power off → Power on again in 20 seconds

This is part of the reliability verification of the system. We set the time interval, whether it is on or off, the heat dissipation of the module is also good, and at first, the test process worked well.

About a few hundred cycles later, we met the same issue that S.Harumoto described.
orin-nx-bootfail_log.txt (95.1 KB)

I> Task: Program display sticky bits (0x50026a84)
I> Task: Storage device deinit (0x50001eec)
I> Task: SMMU external bypass disable (0x50016a60)
I> Task: SMMU init (0x5001697c)
I> Task: Program GICv3 registers (0x50026ba8)
I> Task: Audit firewall settings (0x50023bd0)
I> Task: Bootchain failure check (0x50002434)
I> Current Boot-Chain Slot: 0
I> BR-BCT Boot-Chain is 0, and status is 1. Set UPDATE_BRBCT bit to 0
I> MB2 finished

??TICE: BL31: v2.6(release):07eea4970
> NOTICE: BL31: Built : 07:55:15, Mar 19 2023
> Unhandled Exception in EL3.
x30 = 0x0000000050000d00
x0 = 0x0000000000000000
x1 = 0x00000000be000011
x2 = 0x0000000000000000
x3 = 0x0000000000000011
x4 = 0x0000000080000000
x5 = 0xaa0203f4aa0003f3
x6 = 0x000000008014d340
x7 = 0x0000000000000001
x8 = 0x000000000c199000
x9 = 0x000000005001c380
x10 = 0x55aaa055071dbd35

We had to disconnect the power completely and reconnect it,then power on, the boot process was normal again.

We’re planning to retest it based on L4T 35.4.1.

Although the probability of the issue is very low, we still want to know your suggestions. What does Unhandled Exception in EL3 mean? Thanks!

If anyone else has observed this issue, let’s discuss it, thanks a lot!

B&R
zzw

Hi,

Please test with rel-35.4.1+ devkit and share us the exact reproduce rate about this issue.

The meaning of " Unhandled Exception in EL3" does not really matter here. It is just like a common erorr log that many could be triggered by many causes.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.