Hi, UEFI sometimes switches the boot sequence to the recovery partition (recovery boot). According to online sources, this happens during a kernel panic (if you power off the device or hit reset during boot, this doesn’t occur).
How can I intentionally trigger a kernel panic to reproduce this boot behavior? Or is there another possible cause for this?
I am aware of a boot attempt counter (default value is 3), and once it reaches zero, the UEFI boots into the recovery partition. Is there a way to disable this behavior remotely on a running device and fix the firmware for future units?
Jetson BSP version - 35.5.0.
*** Please note that this reply is generated by LLM automatically ***
To address the issue of the Jetson Orin NX Hi UEFI switching the boot sequence to the recovery partition, we need to understand the possible causes and the mechanisms involved.
Possible Causes:
- Kernel Panic: A kernel panic can trigger the UEFI to switch to the recovery partition. This can happen if the device is powered off or reset during boot.
- Boot Attempt Counter: The boot attempt counter is a mechanism that limits the number of boot attempts before switching to the recovery partition. If the counter reaches zero, the UEFI will boot into the recovery partition.
Intentionally Triggering a Kernel Panic:
To reproduce the boot behavior, you can intentionally trigger a kernel panic by:
- Using the
sysrq key: Press the SysRq key (usually Alt + SysRq) and then press the p key to trigger a kernel panic.
- Using the
kernel.panic sysctl: Set the kernel.panic sysctl to a value of 1, which will trigger a kernel panic on the next boot.
- Using a kernel module: Load a kernel module that intentionally triggers a kernel panic.
Disabling the Boot Attempt Counter:
To disable the boot attempt counter remotely on a running device, you can try the following:
- Using the
efibootmgr command: Run the efibootmgr command with the -b option to set the boot attempt counter to a high value (e.g., 1000).
- Using the UEFI firmware settings: Access the UEFI firmware settings and look for an option to disable the boot attempt counter.
Fixing the Firmware for Future Units:
To fix the firmware for future units, you can:
- Update the UEFI firmware: Update the UEFI firmware to a version that does not have the boot attempt counter issue.
- Modify the UEFI configuration: Modify the UEFI configuration to disable the boot attempt counter or set it to a high value.
Comparison with Other Jetson Models:
The behavior of the Jetson Orin NX Hi UEFI is similar to other Jetson models, such as the Jetson AGX Orin and Jetson Orin Nano. However, the specific mechanisms and settings may vary depending on the model and firmware version.
Additional Information:
- The Jetson BSP version 3550 is a relatively old version, and it’s recommended to update to a newer version to ensure you have the latest features and bug fixes.
- The
example.sh script in the OP-TEE source package can be used to generate an EKS image with a test key, which can be useful for testing the EKS feature.
In summary, to reproduce the boot behavior, you can intentionally trigger a kernel panic using various methods. To disable the boot attempt counter remotely, you can use the efibootmgr command or access the UEFI firmware settings. To fix the firmware for future units, you can update the UEFI firmware or modify the UEFI configuration.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
Actually, the point here is this thing won’t be disabled.
If a kernel panic would happen in consecutive 3 times, it means it could happen forever.
In this case, your board will just stuck in a boot loop which reboots again and again. I don’t think that one makes anything better.
What you need to do is enable your UART serial console and check what is causing the kernel panic consecutively.
Thanks for reply, Am I correct in understanding that the counter only decrements if the kernel panics? Then the system hangs, the watchdog isn’t reset, and the counter decreases. If the power is lost at that moment, will the counter reset back to 3? I don’t recall the exact cause of the panic, but it seems to be related to cold booting. Perhaps a driver or something else triggers the panic at low temperatures, but as the temperature rises, it boots normally. This suggests the file system is fine, and the device simply rebooted three times due to the cold.
We need to run a cold start test to figure out what’s causing the kernel panic.
It might not be “kernel panic” . Kernel panic is more like just one possible reason to this problem.
The more precise way is that if the board fails to get into the file system and execute specific systemd service, then it would make this situation happened.
The systemd service will make the counter back to 3 no matter what previous number it was.