I am facing an issue where after many reboot in the 100’ 1000’, there is a swap to the other slot (so ROOTFS_AB=1), and I can see in nvbootctrl that the former slot is now unbootable.
However, I would like to get the underlined reasons of this phenomenon in order to narrow it down. Is it possible to see some traces in the early stage, eg. mb1/mb2? Is the reason store somewhere that can be read-back at a later time?
Is it possible to provide mb1/mb2 binaries with more logs enabled? Or maybe via some option in the DTS?
may I know what’s your test steps?
for instance,
did you keep at slot-A to do reboot stress test (may I also know detail steps/commands), you’ll see slot-A crashed, and then fall back to slot-B after 1000 reboot cycles?
Yes, I am doing either cold or warm reboot, saying on let’s say slot A. After the system is up for ~1mn, a script ask for a reboot or an external device cut the power. After a while, which can be 100’ or 1000’ reboot cycles, suddenly, the system boot on slot B. If I carry on, it will again after, a while, go back to slot A.
I want to be able to understand where it fails in the boot chain mb1, mb2… and ideally what was the reason for the change of slot, since slot A after the phenomenon is still usable (by waiting for another boot slot swap or via nvbootctrl).
The logs from mb1 and mb2 do not seems to provide enough information. Is there a way to have custom binaries? We can contact Nvidia via our contact in our area in the same way we did for the FSKP, as I can understand you might not want to upload them here.
Can you ask internals if such custom build might be possible?
In the meantime, I will check your link for those assertion issues.
this is cold reboot, it’s same as using hardware reset button to restart the system.
since you’re having power cutoff within a minute, it might be a timing issue that background service has not complete before system shutdown.
you may refer to developer guide, Rootfs Selection.
please add some check of these two background services l4t-rootfs-validation-config.service and nv-l4tbootloader-config.service before cutting of the power for verification.
you should try to cutoff the power by checking sudo systemctl status l4t-rootfs-validation-config.service.
it should wait till status=0/SUCCESS for system shutting down.
for instance,
Process: 415 ExecStart=/opt/nvidia/l4t-rootfs-validation-config/l4t-rootfs-validation-config.sh (code=exited, status=0/SUCCESS)
Main PID: 415 (code=exited, status=0/SUCCESS)
Thanks for the suggestion. I will prepare a new build with the fixes for the uefi assertion and making sure I wait until the status=0/SUCCESS. As this phenomenon is random, I will let the devkit run over the week-end.
We are still investigating. It would be very useful to have mb1, mb2 binaries with more verbose mode. Should we contact directly our NVIDIA representative for this request or is it something that can be sorted out in this channel?