There are several topics about Fail-over Rootfs Slot Switching in Jetpack 5.1.
We found many users having this request so that we share some tips and the verified flow in this post.
Verification steps on the Xavier NX devkit with eMMC
Step 1: Flash the board with rootfs A/B enabled
$sudo ROOTFS_AB=1 ./flash.sh jetson-xavier-nx-devkit-emmc mmcblk0p1
Step 2: After boot up, check current slot status
$sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: A
Active rootfs slot: A
num_slots: 2
slot: 0, retry_count: 3, status: normal
slot: 1, retry_count: 3, status: normal
Step 3: Try to corrupt current file system (current slot: A)
$sudo rm -rf /lib
Step 4: Reset the board
Re-powering the board to perform reset
Step 5: rootfs A/B fail-over mechanism
5-1. It would hit kernel panic at boot up (due to filesystem corrupted)
5-2. watchdog trigger reset after 120s
5-3. totally retry 3 times to enter rootfsA (slot: 0)
5-4. UEFI found rootfsA is unbootable(rootfs A tried 3 times and failed), trigger reboot to switch rootfs slot
5-5. switching to rootfsB
Step 6: After boot up, check current slot status again
$sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: B
Active rootfs slot: B
num_slots: 2
slot: 0, retry_count: 0, status: unbootable
slot: 1, retry_count: 3, status: normal
We provide the flash and full serial console log as following for your reference.
flash.log (70.4 KB)
serial.log (273.2 KB)
The methods to restore corrupted rootfs slot
1. UEFI menu
a. Press `ESC` to enter UEFI Menu
b. choose Device Manager -> NVIDIA Configuration -> L4T Configuration
c. OS chain A status: The value is Unbootable if UEFI attempts recovery kernel, choose Normal
d. Save and exit, reboot, UEFI will try Direct Boot
2. User space command
user can restore the UEFI variable RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
or RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9
in kernel(write value 0).
2-1. For AGX Xavier and the devices without QSPI flash:
a. mount esp to /opt/nvidia/esp
b. write variable to esp
$cd /opt/nvidia/esp/EFI/NVDA/Variables/
$printf "\x07\x00\x00\x00\x00\x00\x00\x00" > /tmp/var_tmp.bin
$sudo dd if=/tmp/var_tmp.bin of=RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
c. reboot, when system boot to UEFI, UEFI will write RootfsStatusSlotA value to uefi_variable partition.
d. After system boots to rootfs successfully(for example restore rootfs A status, boot to rootfs B), we can check that the RootfsStatusSlotA is restored.
2-2. For other device with QSPI flash:
a. write variable to efi
$cd /sys/firmware/efi/efivars/
$printf "\x07\x00\x00\x00\x00\x00\x00\x00" > /tmp/var_tmp.bin
$sudo chattr -i RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
$sudo dd if=/tmp/var_tmp.bin of=RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
b. The RootfsStatusSlotA variable is restored immediately.
Known Issues
1. Xavier NX with SD module may not work.
There’s a watchdog default disabled issue. We are still finding the cause. For a quick workaround, you could refer to this thread to enable it manually.
2. The “endless reboot” in this use case.
There’s a bug in UEFI and we have gotten the root cause. The solution is under verification. It might be fixed in the later Jetpack release.