After running normally for three months, the device was found to be unable to enter the system properly during a boot up, and instead directly entered the UEFI shell terminal.
I entered the UEFI interface from the serial tool to check the boot order and found that the boot order had changed. The SSD boot order was lower than UEFI HTTPv4, UEFI HTTPv6, UEFI Shell, etc.
I reset the boot order and made SSD the first boot option, and the system started normally.
I want to know what causes the entire startup sequence to change or initialize?
I have conducted an experiment and found that replacing an SSD does indeed cause a change in the boot sequence. But before and after this phenomenon occurred, my SSD remained unchanged.
In addition, I have also speculated whether the SSD was not recognized during a certain startup, and when it was recognized during a subsequent restart, it caused any updates to the SSD device during device startup, resulting in a change in the startup sequence. But in actual testing, as long as it is the same SSD, no matter how it is plugged in or unplugged, the boot sequence remains unchanged.
So are there any other reasons that could lead to the entire phenomenon? thank you
Are you using the devkit or custom board for Orin NX?
Is your NVMe SSD not stable to be recognized?
Yes, boot order would get changed when you connect the new boot device and it should be added to the TOP order by default.
You can check this in UEFI menu → Device Manager → NVIDIA Configuration → Boot Configuration → Add new devices to top or bottom of boot order → Top
At first, I also suspected that it was caused by unstable SSDs.
But I have tried doing it this way: turn off - unplug SSD - start - turn off - insert SSD - start. In fact, the UEFI boot sequence has not been changed.
The UEFI boot sequence will only change when an SSD is replaced by another SSD.
Therefore, if the SSD is unstable and unrecognizable during startup, it should not cause UEFI changes.
Are there any other possible reasons?
I will also try to add the new device to the top of the startup sequence, but the probability of this happening is very low, so it is difficult to confirm whether it is effective.
We would need the clear reproduce steps and the full log to further check the issue.
Is the issue specific to current SSD? (i.e. could you reproduce the similar behavior with another SSD?)
It is difficult to reproduce this phenomenon. In fact, we have counted nearly a hundred devices for this customized board model. Currently, it has only appeared once on each of the three devices in the past year. After the startup sequence was reset, it has not been reproduced again for the time being.
In addition, will the logs you mentioned be saved in which location? Because when the boot sequence changes and the system fails to boot from the SSD, the logs should not be saved on the SSD. So, can you only view the debug startup log when a phenomenon occurs?
If it is hard to reproduce, I would suggest you monitoring the issue or perform the stress test.
To debug UEFI(bootloader), it would not save log to rootfs.
You may also need to use debug UEFI firmware to reproduce the issue and provide the full serial console log when you hit the issue.