Unhandled Exception in EL3 error after AGX Xavier reset

I flashed the code to the Xavier Industrial P2888 on the custom carrier board using the command “sudo ./flash.sh jetson-agx-xavier-industrial-cti mmcblk0p1” with the new pinmux settings and right after the flash completed the Xavier works and I can able to display the ubuntu desktop from the DP0 and I could able to complete the ubuntu setup. However, when I reset / power cycle the Xavier I will get the Unhandled Exception in EL3 error at the UART log
uart_error_log_reset after flashed…txt (6.3 KB)
(see attach) and it hangs. What is this error mean and what cause this error?

Thanks
Joe Lam

Since this is custom board, and new jp5 release, could you report this problem to the custom board vendor and let them contact us if they cannot resolve it?

We are using JetPack 4.6.1 to flash code to the Xavier and we design the custom board based on the P2822_B03 Jetson Xavier Developer Kit Carrier Board schematic with some modification. I would like to know what this error mean and where should I check for this error?

I can’t answer, but EL3 is the security mode. All I could figure out is that there is some sort of trusted firmware failure in early boot. I found some related articles, but nothing specific (look at the exception level 3, EL3 topics):
https://chromium.googlesource.com/external/github.com/ARM-software/arm-trusted-firmware/+/v1.4-rc0/docs/firmware-design.md

I am assuming it is the ARMv8-a core 0 with that error, but I suppose it is possible the error came from some other boot control processor (doubtful, but I don’t know enough to say for sure).

I’ll be curious to find out what the cause was when it is solved. Unless something needs signing or does not flash correctly (resulting in a corruption somewhere which requires a signature), and since you used default flash, it is more unusual.

You told us you are using jetpack4.6.1 but your log says you are using jp5. Which one is correct?

You are correct. We are using JetPack version 5.0.1

Ok, so is there any issue in reporting this to the board vendor first? I think the board is from ConnectTech, right?

1 Like

We design and build the carrier board. Our company is Cornet Technology. Inc. Now i am checking the power up and power down sequence and we are using power button supervisor MCU with auto power on case.

Ok. So you are the board vendor. Please clarify this directly when you file topic.

If this is custom board , then did you remember to set cvb eeprom read size to 0 in the BCT cfg file?

Also, since the error is in UEFI, could you build the UEFT debug build to enable full log?

Hello I am with Cornet working on the same project as klaml3o9. I am trying to build the UEFI on linux Ubuntu system and I get the following errors pasted at the end.

I following the guidelines to install mono and updated the images but I still get the error. Could you help me get over this hump . Do you think it is better to compile the UEFI on windows since the NuGet tool is inherently windows.

Thanks

SECTION - Initial update of environment
UpdatingWARNING - [SDE] Failed to fetch NugetDependecy: edk2-acpica-iasl@20200717.0.0: [Nuget] We failed to install this version 20200717.0.0 of edk2-acpica-iasl
WARNING - [SDE] Failed to fetch NugetDependecy: mu_nasm@2.15.05: [Nuget] We failed to install this version 2.15.05 of mu_nasm
. Done
SECTION - Second pass update of environment
UpdatingWARNING - [SDE] Failed to fetch NugetDependecy: mu_nasm@2.15.05: [Nuget] We failed to install this version 2.15.05 of mu_nasm
.WARNING - [SDE] Failed to fetch NugetDependecy: edk2-acpica-iasl@20200717.0.0: [Nuget] We failed to install this version 20200717.0.0 of edk2-acpica-iasl
. Done
ERROR - We were unable to successfully update 2 dependencies in environment
SECTION - Summary

What is your step to build UEFI? Did you follow the guidance from our public source code tarball?

Here is the debug version I got

uefi_Jetson_DEBUG.bin (2.7 MB)

To replace Linux_for_Tegra/bootloader/uefi_jetson.bin and re-flash.

Yes, from edk2-pytool-extensions/using_linux.md at master · tianocore/edk2-pytool-extensions · GitHub was one of them with suggestions on compilation on linux.

Once the crash does occur, one way to recover quickly without having to reflash the whole thing is to just flash BCT

sudo ./flash.sh -r -k BCT jetson-agx-xavier-industrial-cti mmcblk0p1

This seems to fix it until the next crash.

Using the debug uefi from user10090 these are the logs. It seems to be in some kind of loop with errrors. Not sure what these errors are.

logs_with_debug_uefi_version (27.8 KB)

Hi,

I feel you are hitting different issue with the initial issue reported. For example, you don’t even enter UEFI now.

Could you monitor your board uart log after it gets flash with pure image and see how to reproduce this error or any specific error log before this problem happened?

xavier_uefi_debug_output_with_crash (13.5 KB)

Instead of just uploading the debug uefi using the flash command. I reflash the whole system.img which included the debug uefi. After it reflashed and I power cycled it, the logs on the serial output are stored in the file.

Post the crash stage. I ran the command
sudo ./flash.sh -r -k BCT jetson-agx-xavier-industrial-cti mmcblk0p1

and the system will come up. This time the UEFI spits out a lot of debug information
with
PROGRESS CODE: V03051005 I0
PROGRESS CODE: V03050000 I0
and
Deleting fragment fragment@0
Deleting fragment fragment@1
etc…

So it looks like the UEFI debug version does work (I had tried to just flash the uefi and that did not seem to work).

From the logs it looks like what you had deduced before i.e it is not hitting the UEFI is correct. What else could it be ? How do we approach this issue. This seems to happen very consistently if we power cycle.

Thank you,

Hi,

Is there a consistent crash point in the log?

I mean, the log crashed in UEFI in the earliest log, later it crashed even not entering UEFI, and the log you just shared again crashed in UEFI.

Is the log you just shared already enabled UEFI debug log?

The current logs are the correct one with UEFI debug enabled (that is using the uefi image from user100090). I can confirm that it is a debug uefi is because I see a lot of extra debug messages with the UEFI when it does come up correctly. After I power cycle it goes into the crash mode which is there in the logs. I do not see any of the debug message which tells me that the UEFI has not been entered.

Is it possible to test that module on devkit and see if same test can lead to same failure or not?

If it is not reproducible, then I can only ask some hardware folks to provide suggestion here.

Unfortunately we do not have a devkit for the xavier. We only have one for orin (which will be custom made in the future). The custom boards we have are all xavier. Would you be able to request the hardware folks to see if they have any suggestions.

The other question we had was when the flash.sh is run with partition BCT what does it do and why running that seems to fix the situation. Just running the flash.sh for BCT partition (which is much quicker than the whole image) seems to fix the crash issue. Is there any correlation between power outage and the BCT area loosing information ?

Thanks,