Jetson Orin NX stops booting after 7 shutdowns

Hello together,

i have an issue with a custom carrier board which i can’t jet reproduce on a devkit.

I flash the Orin NX 16GB, boot it up and check that everything is working.
Then i shut the jetson down.

Then i start an automated test run with booting the jetson up and shutting it down again. After seven times i get an error.

After the flashing, our board reboots one time to dynamically set its hostname and i shut it down one time before the test runs.
So in total it can be nine boots before it fails.

This is with an Orin NX with SK Hynix memory.
On another Orin NX with Micron RAM it took more than 7 boots but is also failed.
Here is the debugging log of this:
orin_nx_uefi_error_after_7_boot.txt (32.4 KB)

Do you have any idea how to further debug this?

Hi,
Please share which Jetpack version you are using? So the same version on developer kit, you don’t observe the issue. Correct? Is it cold-boot or warm-boot?

Hi,

sure: It is cold boot and JetPack 5.1.3.
On the devkit with the same SOM and the same JetPack, i haven’t seen the issue so far.

Hi everyone,

has someone an idea what i can check to proceed on this issue?
Since it is still related to the same devices as in this Topic:

We are still stuck on flashing the devices and sending them to our customers and therefore would very appreciate some progress.

Just to double confirm the Orin NX with SK Hynix memory is tested with JP5.1.3 SW, right?
How’s about the Orin NX with Micron memory device? Supposed it can only work with JP5.1.4 SW which add the support as PCN211361.

Hi,

yes, my tests were with JP5.1.3.
Both devices had the issue but the Orin NX with the Micron Memory took more restart to have this issue.

I looked in the PCN211361 and compared the level part number on both Orin NX. They have both the same: 900-13767-0000-000.
So, they are both affected from this issue, even if one has Micron memory on it?

Hi andreas.brinkhaus,

ASSERT [FvbNorFlashStandaloneMm] /dvs/git/dirty/git-master_linux/out/nvidia/optee.t234-uefi/StandaloneMmOptee_RELEASE/edk2-nvidia/Silicon/NVIDIA/Drivers/FvbNorFlashDxe/FvbNorFlashStandaloneMm.c(868): ((BOOLEAN)(0==1))

For the above assertion issue, please apply the following patches to check if they could help.
fix(stmm): allow measurement partition to be zero filled · NVIDIA/edk2-nvidia@e4c86ce
fix: reset the meas buffer after computing the first measurement · NVIDIA/edk2-nvidia@615288a

Steps as following to apply the patches :

Step 1. apply the patch to correct source file

Step 2. run the following command to build uefi_StandaloneMmOptee_RELEASE.bin
$ edk2_docker edk2-nvidia/Platform/NVIDIA/StandaloneMmOptee/build.sh

Step 3. refer to the steps in atf_and_optee_README.txt to build tos image and update tos image in <Linux_for_Tegra>/bootloader/tos-optee_t234.img

Step 4. flash the QSPI only to apply the change

Or you can simply update to the latest JP5.1.4(R35.6.0), which should include above patches.

Hi together,

i worked on integrating the patches but got a new error on my way:

I selected the version r35.5.0-updates of the edk2 setup.
Then i compiled it, integrated it in the Jetpack without the patch to see if my setup is working.
To my surprise, after flashing the Board successfully, it was cold-booting without any issues for many times.

I let my auto-cold-boot setup running over night. After 800 to 1100 cold-boots i got a new error:
ASSERT [VariableStandaloneMm] /build/r35.5.0-updates/edk2/MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c(3264): !(((INTN)(RETURN_STATUS)(Status)) < 0)

What is also interesting: The last time, i got the error over and over again after each boot.
Now i got the new error only one time and after all following boots, i get a second error:
orin_nx_error_after_boot_800-1100.txt (40.0 KB)

What i am wondering now:

  1. Is the version r35.5.0-updates also the Version being used in L4T35.5.0 and therefore should give me the same error i had before?
  2. Do you know the new errors an can tell me a fix?
  3. Is it woth trying the patch since it addresses the old error wich seems to be fixed in the version i compiled now?

r35.5.0-updates is a branch based on r35.5.0 and integrated with some fixes.

It seems you hit similar issue as following thread. Please check if it could help in your case.
Orin NX hangs on optee (r35.5.0) - #5 by WayneWWW

We would still suggest you applying that patch to fix potential assertion issue.

Hello together,

here is an update from my side:

I now compiled everything with the branch uefi-202210.5. This seems to include the first set of patches mentioned in this forum topic.
Then i also implemented the optee change from the other forum topic.
This now seems to work fine until now.

My question is now:
Is it valid to choose the branch uefi-202210.5 for L4T 35.5.0?
The versioning is not so transparent with the uefi combos but as i would read it, it should be fine?

Okay, you should be fine to use this branch for L4T R35.5.0

1 Like