UEFI waits in Boot Maintenance Manager during boot

Hello,

We use Orin Nano 4GB SKU on a custom carrier board as a robot controller. The BSP is customized based on 35.4.1 L4T.

Recently there have been some cases where the device is not booting, upon verifying on the debug console, the observation is that the UEFI is in boot waiting for user input. On the device, the debug port is not used for interfacing to any device it is reserved for debugging. We are not sure what is causing it to trigger the UEFI to wait.

Since our system is headless, and there is no need for us to have the boot wait option in UEFI, I approached the suggestions/patches proposed in the forum, rebuilt the UEFI updated the BSP with the new binaries and reflashed, after I can confirm that the logs related to the boot wait and test key warns are gone and everything is working as expected.

But, still in few times while powering ON, we observe the boot still waits and in the serial console it loops in Boot Maintenance Manager.

Can someone help me find/figure out the cause for this behaviour? Our robots are in the deployment phase now, we would like to address this issue soon.

recent-boot-issues.log (46.5 KB)
normal-boot-logs-with-uefi-boot-wait-disabled.log (32.7 KB)

Hi bhuvanchandra.dv1,

\xFF\xE1
..
\x1B[2J\x1B[04D\x1B[=3h\x1B[2J\x1B[09D

Do you know how the above HEX come from?
I don’t see those HEX in the serial console log in my Orin Nano devkit.

Do you connect any USB/serial device on Orin Nano?

Please share the full steps how did you perform this including cloning UEFI source → modify source → build binary → relace binary.
You should not enter into UEFI menu if you’ve disabled the feature correclty.

Do you know how the above HEX come from?
I don’t see those HEX in the serial console log in my Orin Nano devkit.

I connected the debug UART lines to Logic Analyzer, when the RX line is floating, there is some noise on it whenever there is some data on Tx time. due to which LA sometimes picks them as random frames. I think we can ignore them.

Do you connect any USB/serial device on Orin Nano?

No device in specific. I connect them to Logic Analyzer to check the data. In production, there is nothing connected to the debug port.

Please share the full steps how did you perform this including cloning UEFI source → modify source → build binary → relace binary.
You should not enter into UEFI menu if you’ve disabled the feature correclty.

Based on the instructions here: Build with docker · NVIDIA/edk2-nvidia Wiki · GitHub and After I reflash the uefi compile with docker, device not working correctly - #6 by rex.ch.lin

$ export EDK2_DEV_IMAGE="ghcr.io/tianocore/containers/ubuntu-22-dev:latest"
$ export EDK2_USER_ARGS="-v \"${HOME}\":\"${HOME}\" -e EDK2_DOCKER_USER_HOME=\"${HOME}\""
$ export EDK2_BUILD_ROOT="${HOME}/build"
$ export EDK2_BUILDROOT_ARGS="-v \"${EDK2_BUILD_ROOT}\":\"${EDK2_BUILD_ROOT}\""
$ alias edk2_docker="docker run -it --rm -w \"\$(pwd)\" ${EDK2_BUILDROOT_ARGS} ${EDK2_USER_ARGS} \"${EDK2_DEV_IMAGE}\""
$ edk2_docker init_edkrepo_conf
$ edk2_docker edkrepo manifest-repos add nvidia https://github.com/NVIDIA/edk2-edkrepo-manifest.git main nvidia
$ edk2_docker edkrepo clone nvidia-uefi-r35.4.1-updates NVIDIA-Platforms r35.4.1-updates
$ edk2_docker edk2-nvidia/Platform/NVIDIA/Jetson/build.sh

After a successful build, I took the relevant files from images, and deployed them on our BSP and flashed them using:
files:

BOOTAA64_Jetson_RELEASE.efi -> bootloader/BOOTAA64.efi
L4TConfiguration_Jetson_RELEASE.dtbo -> bootloader/L4TConfiguration.dtbo
uefi_Jetson_RELEASE.bin -> bootloader/uefi_jetson.bin
sudo ./flash.sh -k A_cpu-bootloader -c bootloader/t186ref/cfg/flash_t234_qspi.xml jetson-orin-nano-rapyuta-jiri nvme0n1p1

Addendum: Please find the diff of the edk2_nvidia and flash logs
flash-logs.zip (10.7 KB)

edk2-nvidia.zip (909 Bytes)

Are those noise caused by your HW design since I don’t see them on the devkit?

Except for the debug port, is there any devices connected on other USB/UART ports?

It seems your steps and modifications in UEFI source are correct to disable entering UEFI menu.

This command will only update bootloader partition in slot A.
Could you also try to use initrd flash script to flash the whole board to apply the change?

Are those noise caused by your HW design since I don’t see them on the devkit?

Seems like it is an issue on our custom board. The RX line floating. We should add a weak pull-up on the Rx line on the 3v3 side. I reported to our electrical team.

Except for the debug port, is there any devices connected on other USB/UART ports?

On USB we have a USB-UART controller. On other UART ports, we have an MCU interfaced

This command will only update bootloader partition in slot A.

Yes. We disabled ROOTFS_AB.

Could you also try to use initrd flash script to flash the whole board to apply the change?

I will give it a try. But it will be very inconvenient to reflash the full OS to update the bootloader, on site. The robot must undergo an onboarding process, which will take time.

It seems an issue which should be resolved from HW side.

You can remove all of them and check if it would still enter into UEFI menu.
If not, you can connect them back one-by-one to get which one send data to Jetson and caused it enters into UEFI menu.

They are different. We enable boot chain A/B by default even if you don’t enable rootfs A/B. (i.e. there’s still slot A/B for bootloader for failover mechanism)

It’s just to debug and check if it may help in your case.
In the most of the cases, you need to only update bootloader partition for the fix in UEFI.

@KevinFFF
Unfortunately no luck either, this issue is causing productivity issues for us at the customer site. If you have any other insights, please let me know.

Could you reproduce this issue on the devkit since I can not enter into UEFI menu after removing that feature in UEFI?

Could you reproduce this issue on the devkit since I can not enter into UEFI menu after removing that feature in UEFI?

In my previous post, I mentioned the issue of UEFI entering boot wait during boot on a custom board with default UEFI settings. Unlike the devkit, the RX line on the board has no noise as it’s pulled up, resulting in no random boot waits during boot. However, even after disabling the boot wait on the UEFI side, based on the suggestions in the forum, it still enters into a different configuration mode on our custom board, which is quite confusing. I’m seeking a solution to identify the cause of the boot wait despite disabling the relevant settings. Are we possibly missing some additional configuration?

Please do not close the ticket without finding an appropriate solution; if there is any other way to receive support, please let me know.

Do you think the RX of debug Uart causing the current issue?

After applying the following change, you should not enter into UEFI menu.
It has been verified from us and many forum users.
Close or hide UEFI menu - #3 by KevinFFF
I’m not sure if you fail/miss any step and cause it still enter into the menu.

You can also refer to the following thread to remove the 5s delay in UEFI.
Q: There is a wait in UEFI stage. How to disable it to accelerate system booting?

I just checked this log again. Why it doesn’t enter into kernel and boot up completely?

I just checked this log again. Why it doesn’t enter into kernel and boot up completely?

I intentionally did not take the kernel logs to capture only early boot logs and UEFI.

Do you think the RX of debug Uart causing the current issue?

This issue is the cause for stopping the uefi during boot. But it might not be the cause for boot wait even after disabling it.

You can also refer to the following thread to remove the 5s delay in UEFI.

I will follow it and redo it.

1 Like

@KevinFFF

I did rebuild the UEFI again you can check the changes here: disable test Key and boot wait features/options · rapyuta-robotics/edk2-nvidia@bdc9647 · GitHub (from what I see there are no changes from the diff I shared earlier. the only change is to drop the wait from 5 sec to 0)

I flashed the full OS and I’m trying to reproduce the issues, so far looks okay. I will deploy this in our production setup and reply if I see this issue again.

I attached the boot logs.
boot-logs.txt (90.6 KB)

@KevinFFF

I’m at a customer site today and tested 10 robots. There are cases where some robots are not booting up. I tried the power on-off cycles five times. The issue is not consistent across any of the robots; the occurrence is random. When the robot is in the boot wait state, I connected to the debug port and noticed that all the robots in the boot wait state displayed the UEFI menu. Note that all these robots are running the custom UEFI with disabled boot wait configurations, as per your previous suggestion.

Based on the current tests, the software changes made to the UEFI are either insufficient or not applied at runtime.

Please find the attached logs. Some log files cover a case where after flashing the custom UEFI, upon the first boot it ends in the UEFI menu.
boot-lock-logs.zip (32.8 KB)

Do you mean the issue is specific to your production setup?
(i.e. you can not reproduce the issue on the devkit after disabling the UEFI menu)

Did you apply the modified UEFI binary(with both “UEFI menu disabled” and “auto boot time to 0s”) to each devices?
How did you apply the change? Update UEFI partition only for flash the whole board?

As my understanding, you should not enter into UEFI menu after apply those changes.
Maybe you can also flash the debug UEFI binary and check the log to know why it still enters into the UEFI menu.

I would suggest you fix the issue of RX line floating issue first.

@KevinFFF, I flashed the fixed UEFI binaries on all 10 robots using the same command I posted earlier in this thread:

sudo ./flash.sh -k A_cpu-bootloader -c bootloader/t186ref/cfg/flash_t234_qspi.xml jetson-orin-nano-rapyuta-jiri nvme0n1p1

A couple of hours ago, I also reflashed the full OS on all 10 robots:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --flash-only --showlogs --network usb0 --massflash 1

After this, it seems the UEFI changes have been fully applied, as I did not encounter the boot wait in the UEFI menu.

Does this mean I need to flash the full BSP again even if I only need to update the bootloader, or are there any steps I’m missing?

Regarding the hardware fix, we have considered it for the next hardware iteration. For now, we are implementing a software fix only.

This command can be used to update UEFI partition on slot A.
If you were booting from slot A, then you would get the change after applying it.
If you were booting from slot B, then the UEFI in slot B is still the original one so that you don’t get the chagne.

Slot A/B for bootloader is enabled by default even if you don’t enable redundant root file system(A/B).

@KevinFFF
does it mean I have to update both A_cpu_bootloader and B_cpu_bootloader?

Update A_cpu_bootloader is used for the verification and and it is supposed that you were booting from slot A.
We would suggest you update both A_cpu_bootloader and B_cpu_bootloader or you can just re-flash the whole board to apply the change.

Noted, thanks.