Rootfs A/B redundancy fail-over mechanism in Jetpack5.1

There are several topics about Fail-over Rootfs Slot Switching in Jetpack 5.1.
We found many users having this request so that we share some tips and the verified flow in this post.

Verification steps on the Xavier NX devkit with eMMC

Step 1: Flash the board with rootfs A/B enabled 
$sudo ROOTFS_AB=1 ./ jetson-xavier-nx-devkit-emmc mmcblk0p1

Step 2: After boot up, check current slot status
$sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: A
Active rootfs slot: A
num_slots: 2
slot: 0,             retry_count: 3,             status: normal
slot: 1,             retry_count: 3,             status: normal

Step 3: Try to corrupt current file system (current slot: A)
$sudo rm -rf /lib

Step 4: Reset the board
Re-powering the board to perform reset

Step 5: rootfs A/B fail-over mechanism
5-1. It would hit kernel panic at boot up (due to filesystem corrupted)
5-2. watchdog trigger reset after 120s
5-3. totally retry 3 times to enter rootfsA (slot: 0)
5-4. UEFI found rootfsA is unbootable(rootfs A tried 3 times and failed), trigger reboot to switch rootfs slot
5-5. switching to rootfsB

Step 6: After boot up, check current slot status again
$sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: B                                                          
Active rootfs slot: B                                                           
num_slots: 2                                                                    
slot: 0,             retry_count: 0,             status: unbootable             
slot: 1,             retry_count: 3,             status: normal

We provide the flash and full serial console log as following for your reference.
flash.log (70.4 KB)
serial.log (273.2 KB)

The methods to restore corrupted rootfs slot

1. UEFI menu

a. Press `ESC` to enter UEFI Menu
b. choose Device Manager -> NVIDIA Configuration -> L4T Configuration 
c. OS chain A status: The value is Unbootable if UEFI attempts recovery kernel, choose Normal 
d. Save and exit, reboot, UEFI will try Direct Boot

2. User space command

user can restore the UEFI variable RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 or RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9 in kernel(write value 0).

2-1. For AGX Xavier and the devices without QSPI flash:

a. mount esp to /opt/nvidia/esp
b. write variable to esp
    $cd /opt/nvidia/esp/EFI/NVDA/Variables/
    $printf "\x07\x00\x00\x00\x00\x00\x00\x00" > /tmp/var_tmp.bin
    $sudo dd if=/tmp/var_tmp.bin of=RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
c. reboot, when system boot to UEFI, UEFI will write RootfsStatusSlotA value to uefi_variable partition.
d. After system boots to rootfs successfully(for example restore rootfs A status, boot to rootfs B), we can check that the RootfsStatusSlotA is restored.

2-2. For other device with QSPI flash:

a. write variable to efi
    $cd /sys/firmware/efi/efivars/
    $printf "\x07\x00\x00\x00\x00\x00\x00\x00" > /tmp/var_tmp.bin
    $sudo chattr -i RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
    $sudo dd if=/tmp/var_tmp.bin of=RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
b. The RootfsStatusSlotA variable is restored immediately.

Known Issues

1. Xavier NX with SD module may not work.

There’s a watchdog default disabled issue. We are still finding the cause. For a quick workaround, you could refer to this thread to enable it manually.

2. The “endless reboot” in this use case.

There’s a bug in UEFI and we have gotten the root cause. The solution is under verification. It might be fixed in the later Jetpack release.

Thankyou @KevinFFF . By when can we expect the Jetpack release which solves the “endless reboot” issue?

Thank you @KevinFFF for the support. This solution seems to be working.

I suggest the following code snippet, if you want to make both slots bootable again:

For AGX and without QSPI

    mount PARTLABEL=esp /opt/nvidia/esp
	for ROOTFS_STATUS in RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9; do
		if [ -e /opt/nvidia/esp/EFI/NVDA/Variables/$ROOTFS_STATUS ]; then
			chattr -i /opt/nvidia/esp/EFI/NVDA/Variables/$ROOTFS_STATUS
			printf "\x07\x00\x00\x00\x00\x00\x00\x00" | dd of=/opt/nvidia/esp/EFI/NVDA/Variables/$ROOTFS_STATUS oflag=sync


	for ROOTFS_STATUS in RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9; do
		if [ -e /sys/firmware/efi/efivars/$ROOTFS_STATUS ]; then
			chattr -i /sys/firmware/efi/efivars/$ROOTFS_STATUS
			printf "\x07\x00\x00\x00\x00\x00\x00\x00" | dd of=/sys/firmware/efi/efivars/$ROOTFS_STATUS oflag=sync


  • Please be sure to only use the PARTLABEL option if you have just one disk with this partition. If you have two partitions with the same name, you’ll randomly get one selected.
  • Some people might have a read only FS, so the /tmp directory might not be writable
  • Since the change of the boot slot usually is done directly before rebooting the oflag=sync will make sure that the data has actually been written and is not cached and lost.

May I suggest that this step is either added to the nvbootctl command “set-active-boot-slot” or added as a separate command of the tool? This is the second time we have to use the “printf” workaround already.

Thanks a lot KevinFFF. I will try this solutions (I think I tried something similar and did not work, but I might have skipped something).


@KevinFFF The issues are not listed in the Known Issues section of 5.1.1. Does that mean they are all fixed there?

I’ve tested the new version. The bugs are not fixed.
So they are targeted for 5.2?

@KevinFFF Any info of the state of your fix?

As it is not implemented in 5.1.1, will it be in 5.2 or will you add a verrsion 5.1.2 due to it’s importance?

@WayneWWW @JerryChang

Can anyone comment on this issue?

Let me update the current status:

Issue 1 is about the devkit with SD module only, not production module, we’ve only the workaround for this issue.

Issue 2 will be fixed in the next release (JP5.1.2)

1 Like