AGX Orin Boot Hang on Optee

Hi Folks -

We are stress testing some AGX Orin by power cycling them repeatedly without warning. We realize this is not a friendly thing to do, but we are required to do it in our application. After a few hundred cycles (more than 200 less than 1000 cycles) the machines will eventually stop booting.

The serial console returns the following error message:

[2023-02-12 13:35:10.201831] ESC   to enter Setup.
[2023-02-12 13:35:10.201849] F11   to enter Boot Manager Menu.
[2023-02-12 13:35:10.201867] Enter to continue boot.
[2023-02-12 13:35:10.201884] **********************************
[2023-02-12 13:35:10.217694] **  WARNING: Test Key is used.  **
[2023-02-12 13:35:10.217801] **********************************
[2023-02-12 13:35:10.217824] **  WARNING: Test Key is used.  **
[2023-02-12 13:35:15.209196] ......PROGRESS CODE: V03051007 I0
[2023-02-12 13:35:15.513211] <FF><E4>
[2023-02-12 13:35:15.529519] ASSERT [VariableStandaloneMm] /dvs/git/dirty/git-master_linux/out/nvidia/optee.t234/uefi/StandaloneMmOptee_RELEASE/edk2/MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c(3255): !(((INTN)(RETURN_STATUS)(Status)) < 0)

We have managed to hit ESC and enter setup and wandered around in the TUI, but we have been unable to get the system to boot.

Can anyone help us identify the error and avoid this failure mode in the future?

Thanks,
sam

Hi waldman,

Are you using the devkit or custom board?
What’s the Jetpack version in use?

The is the assertion from UEFI which causes you boot up stopped here.

Hi @KevinFFF:

We are using a custom board for this specific test.
We are using L4T 35.1 // JP5.0.2.

We see that the optee ASSERT fails, we don’t know what the ASSERT means. Any suggestions are welcome!

sam

Just upgrade jetpack to jp5.1. This issue has already been fixed on jp5.1.

2 Likes

Thanks @WayneWWW -

We have an upcoming deliverable and cannot upgrade to 5.1 at this time. It will be a couple of weeks until we can upgrade.

Is there a workaround that we can apply to 5.0.2 in the meantime?

Please modify the following line to increase the heap size for stmm.

diff --git a/core/arch/arm/kernel/stmm_sp.c b/core/arch/arm/kernel/stmm_sp.c
index bf90eef..b45cc90 100644
--- a/core/arch/arm/kernel/stmm_sp.c
+++ b/core/arch/arm/kernel/stmm_sp.c
@@ -76,7 +76,7 @@
 static const uint16_t ffa_storage_id = 4U;
 
 static const unsigned int stmm_stack_size = 4 * SMALL_PAGE_SIZE;
-static const unsigned int stmm_heap_size = 600 * SMALL_PAGE_SIZE;
+static const unsigned int stmm_heap_size = 750 * SMALL_PAGE_SIZE;

You could refer to the following thread about build and update optee.
OP-TEE on Jetson AGX Xavier - Jetson & Embedded Systems / Jetson AGX Xavier - NVIDIA Developer Forums

Awesome, thank you.
We compiled and installed the updated optee. Will get to stress testing this weekend and have a definitive result.

1 Like

We got our stress test running and have ~1500 boot cycles without interruption. This stmm heap size increase fixed the problem. Thank you!

-sam

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.