I did a test that reboot orinnx repeatly very 20 seconds after nv-l4tbootloader-config.service boot up successfully at slot A. But it switched to slot B unexpectly and it reported bootloader of slot A was unbootable.
Then powered off orinnx module and powered on orinnx agin, the bootloader of slot A return to bootable autoly, how did it that ?
*** Please note that this reply is generated by LLM automatically ***
Based on the provided context, it appears that the Jetson Orin NX module is experiencing an issue with the bootloader, where it repeatedly reboots every 20 seconds after successfully booting up with the nvl4tbootloaderconfigservice at slot A. However, it unexpectedly switches to slot B and reports that the bootloader of slot A is unbootable.
This behavior suggests that there might be an issue with the bootloader configuration or the slot switching mechanism. Here are a few possible explanations:
Bootloader configuration issue: The nvl4tbootloaderconfigservice might be configuring the bootloader incorrectly, leading to the repeated reboots and slot switching. Itās possible that the configuration is not being persisted correctly, causing the bootloader to switch to slot B.
Slot switching mechanism issue: The slot switching mechanism might be malfunctioning, causing the bootloader to switch to slot B unexpectedly. This could be due to a hardware or firmware issue.
Power cycle issue: The fact that the bootloader of slot A returns to bootable automatically after powering off and on the Orin NX module suggests that there might be a power cycle-related issue. Itās possible that the power cycle is causing the bootloader to reset or reconfigure itself.
To troubleshoot this issue, you can try the following steps:
Check the bootloader configuration: Verify that the nvl4tbootloaderconfigservice is configuring the bootloader correctly. You can check the configuration files and logs to ensure that the configuration is being applied correctly.
Check the slot switching mechanism: Investigate the slot switching mechanism to ensure that it is functioning correctly. You can check the logs and configuration files to see if there are any errors or issues related to slot switching.
Check the power cycle: Verify that the power cycle is not causing any issues with the bootloader. You can try powering off and on the Orin NX module multiple times to see if the issue persists.
Check the bootloader logs: Check the bootloader logs to see if there are any errors or issues related to the repeated reboots and slot switching.
Try updating the bootloader: Try updating the bootloader to the latest version to see if it resolves the issue.
In comparison to other Jetson modules, the Orin NX module has a similar bootloader configuration and slot switching mechanism. However, the specific issue you are experiencing might be unique to the Orin NX module or the specific hardware configuration.
To provide more detailed assistance, I would need more information about the specific hardware configuration, the bootloader configuration, and the logs related to the issue.
*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***
actually, systemd will do lots of service to initialize while system has booted up.
you should calling $ sudo systemctl status nv-l4t-bootloader-config, to confirm background service is executed successfully(status=0/SUCCESS) before rebooting the system.
please adding status check for background service.
if thatās possible, please moving to the latest JP-5 since thereāre several bugs fixes included.
We face a very similar issue and we are in 35.6.2 and assertions were addressed. Initially we though this could come form our custom board, but now we are able to reproduce it on an official devkit.
From current investigation, 25% of the modules seems affected by this boot swap issue the other does not show this phenomenon.
Can internal try to reproduce this? Surely you must have some stress test environment available.
Sadly no, and this impact the production line so many people from different departments in my company are working to identify the culprit for months now.
We have setup bench of stress tests and we have tried countless experiments but no luck so far. Lately we were able to reproduce on an official devkit and as you face similar problemā¦.
Current knowledge is heading toward an issue within the Pcie/nvme driver within UEFI but who knows.
Yeap, thatās also what we observe. The āunbootableā flag for the rootfs seems to persist a power cycle. Not the āunbootableā flag for the bootloader which is cleared at power cycles.
Also when, during stress tests, the error happens twice (AāBāA) without doing a powercycle, first the bootloader A is marked unbootable and it boots on B, then when the error occurs again it reboots on A (which is marked as normal again) and B gets marked unbootable.
according to above, may I know the SKUs of those specific Orin modules.
please also share the test steps for reference, we would like to reproduce this issue locally.
We have other 699-13767-0000-301 G.1 that we stress tested for over 15ā000 reboot loops without any issue. So there is no specific board revision that has the issue systematically. I try to get more info from prod about the quarantined modules to have a more complete board revision list on impacted modules.
The test setup is simple, we have them flashed with ROOTFS_AB, and there is a script installed on both, A and B slots, that checks the current slot and logs if it is not the same as the previous slot, including the number of reboots since last slot switch, and reboots. The script always waits for an uptime of 1 minute before running, not to trigger the built-in rollback feature.
Impacted modules have an average of a few hundred reboots between occurrences. Sometimes less. Not impacted modules have done over 10k reboots without an issue. We tried it both with soft reboots with the script, as well as more complex setups with power off and external power cycling. On impacted modules the issue happens with both soft and hard reboots.