I have several NVIDIA Jetson Xavier’s that I have applied some security measures including disk encryption. Recently one of them got into a boot loop state after power was abruptly cut. The the serial console shows “Checking in progress on 1 disk” on repeat so it appears to me like the dirty bit is set and fsck is trying to run on the encrypted drive but probably failing because its encrypted. So I have 2 questions:
Any recommendations for how to recover the computer with the boot loop problem
How can I adjust my other computers so they don’t suffer the same fate? Due to the nature of my application its hard to prevent unexpected power loss.
I don’t know the answer. I will suggest though that if you can clone the rootfs, and then work on the clone on another Linux PC with LVM and encrypted partition setup, that you can probably succeed on the clone under loopback. Then the corrected clone could be flashed back.
If this were a PC with a GRUB command line, I’d know how to add some arguments to boot to force it to go to fsck without trying to mount the drive first (which helps because mounting breaks some fsck scenarios). It might even be that this could be done from the UEFI boot command line, but I don’t know.
The gist is that when you add this to the kernel command line it goes to fsck: fsck.mode=force (optionally you could also add "fsck.repair=yes`")
Anyone know how to add that to the kernel command line from the UEFI console on a Jetson (which does not use GRUB)? Possibly that could avoid the need for clone->fix->reflash.