Rootfs A/B redundancy fail-over mechanism in Jetpack5.1

KevinFFF · March 24, 2023, 9:56am

There are several topics about Fail-over Rootfs Slot Switching in Jetpack 5.1.
We found many users having this request so that we share some tips and the verified flow in this post.

Verification steps on the Xavier NX devkit with eMMC

Step 1: Flash the board with rootfs A/B enabled 
$sudo ROOTFS_AB=1 ./flash.sh jetson-xavier-nx-devkit-emmc mmcblk0p1

Step 2: After boot up, check current slot status
$sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: A
Active rootfs slot: A
num_slots: 2
slot: 0,             retry_count: 3,             status: normal
slot: 1,             retry_count: 3,             status: normal

Step 3: Try to corrupt current file system (current slot: A)
$sudo rm -rf /lib

Step 4: Reset the board
Re-powering the board to perform reset

Step 5: rootfs A/B fail-over mechanism
5-1. It would hit kernel panic at boot up (due to filesystem corrupted)
5-2. watchdog trigger reset after 120s
5-3. totally retry 3 times to enter rootfsA (slot: 0)
5-4. UEFI found rootfsA is unbootable(rootfs A tried 3 times and failed), trigger reboot to switch rootfs slot
5-5. switching to rootfsB

Step 6: After boot up, check current slot status again
$sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: B                                                          
Active rootfs slot: B                                                           
num_slots: 2                                                                    
slot: 0,             retry_count: 0,             status: unbootable             
slot: 1,             retry_count: 3,             status: normal

We provide the flash and full serial console log as following for your reference.
flash.log (70.4 KB)
serial.log (273.2 KB)

The methods to restore corrupted rootfs slot

1. UEFI menu

a. Press `ESC` to enter UEFI Menu
b. choose Device Manager -> NVIDIA Configuration -> L4T Configuration 
c. OS chain A status: The value is Unbootable if UEFI attempts recovery kernel, choose Normal 
d. Save and exit, reboot, UEFI will try Direct Boot

2. User space command

user can restore the UEFI variable RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 or RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9 in kernel(write value 0).

2-1. For AGX Xavier and the devices without QSPI flash:

a. mount esp to /opt/nvidia/esp
b. write variable to esp
    $cd /opt/nvidia/esp/EFI/NVDA/Variables/
    $printf "\x07\x00\x00\x00\x00\x00\x00\x00" > /tmp/var_tmp.bin
    $sudo dd if=/tmp/var_tmp.bin of=RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
c. reboot, when system boot to UEFI, UEFI will write RootfsStatusSlotA value to uefi_variable partition.
d. After system boots to rootfs successfully(for example restore rootfs A status, boot to rootfs B), we can check that the RootfsStatusSlotA is restored.

2-2. For other device with QSPI flash:

a. write variable to efi
    $cd /sys/firmware/efi/efivars/
    $printf "\x07\x00\x00\x00\x00\x00\x00\x00" > /tmp/var_tmp.bin
    $sudo chattr -i RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
    $sudo dd if=/tmp/var_tmp.bin of=RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9
b. The RootfsStatusSlotA variable is restored immediately.

Known Issues

1. Xavier NX with SD module may not work.

There’s a watchdog default disabled issue. We are still finding the cause. For a quick workaround, you could refer to this thread to enable it manually.

2. The “endless reboot” in this use case.

There’s a bug in UEFI and we have gotten the root cause. The solution is under verification. It might be fixed in the later Jetpack release.

sanaurrehman · March 24, 2023, 10:41am

Thankyou @KevinFFF . By when can we expect the Jetpack release which solves the “endless reboot” issue?

seeky15 · March 28, 2023, 5:32am

Thank you @KevinFFF for the support. This solution seems to be working.

I suggest the following code snippet, if you want to make both slots bootable again:

For AGX and without QSPI

    mount PARTLABEL=esp /opt/nvidia/esp
	for ROOTFS_STATUS in RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9; do
		if [ -e /opt/nvidia/esp/EFI/NVDA/Variables/$ROOTFS_STATUS ]; then
			chattr -i /opt/nvidia/esp/EFI/NVDA/Variables/$ROOTFS_STATUS
			printf "\x07\x00\x00\x00\x00\x00\x00\x00" | dd of=/opt/nvidia/esp/EFI/NVDA/Variables/$ROOTFS_STATUS oflag=sync
		fi
	done

With QSPI

	for ROOTFS_STATUS in RootfsStatusSlotA-781e084c-a330-417c-b678-38e696380cb9 RootfsStatusSlotB-781e084c-a330-417c-b678-38e696380cb9; do
		if [ -e /sys/firmware/efi/efivars/$ROOTFS_STATUS ]; then
			chattr -i /sys/firmware/efi/efivars/$ROOTFS_STATUS
			printf "\x07\x00\x00\x00\x00\x00\x00\x00" | dd of=/sys/firmware/efi/efivars/$ROOTFS_STATUS oflag=sync
		fi
	done

Explanation

Please be sure to only use the PARTLABEL option if you have just one disk with this partition. If you have two partitions with the same name, you’ll randomly get one selected.
Some people might have a read only FS, so the /tmp directory might not be writable
Since the change of the boot slot usually is done directly before rebooting the oflag=sync will make sure that the data has actually been written and is not cached and lost.

May I suggest that this step is either added to the nvbootctl command “set-active-boot-slot” or added as a separate command of the tool? This is the second time we have to use the “printf” workaround already.

alvaro.gimenez.s · March 29, 2023, 7:34am

Thanks a lot KevinFFF. I will try this solutions (I think I tried something similar and did not work, but I might have skipped something).

Regards,
Alvaro.

seeky15 · March 30, 2023, 6:41am

@KevinFFF The issues are not listed in the Known Issues section of 5.1.1. Does that mean they are all fixed there?

seeky15 · March 30, 2023, 9:19am

I’ve tested the new version. The bugs are not fixed.
So they are targeted for 5.2?

seeky15 · April 3, 2023, 4:57am

@KevinFFF Any info of the state of your fix?

As it is not implemented in 5.1.1, will it be in 5.2 or will you add a verrsion 5.1.2 due to it’s importance?

seeky15 · April 11, 2023, 4:53am

@WayneWWW @JerryChang

Can anyone comment on this issue?

KevinFFF · May 17, 2023, 2:38am

Let me update the current status:

Issue 1 is about the devkit with SD module only, not production module, we’ve only the workaround for this issue.

Issue 2 will be fixed in the next release (JP5.1.2)

farough · March 14, 2024, 6:18pm

Hi,

I am working to get the fail over to work on both jetpack 4.6 and jetpack 5.1 on nvme.
We are working on a product and it is important to have OS fail over for our application.
I followed the steps for making redundant rootfs.

On Jetpack 5.1.1, When I check the status of the watchdog using this command, it is disabled.
cat /proc/device-tree/watchdog@30c0000/status
I did the test again on Jetpack 5.1.2 on nvme, the watch dog is still disabled.
I tested Jetpack 5.1.3, on nvme it does not even boot. The Jetpack is problematic at the first place before doing any redundant rootfs steps.
On Jetpack 4.6.3 on nvme, after running the redundant rootfs command, the OS does not boot (note this is happening before corrupting the rootfs for testing).

Redundant rootfs did not work on any of these Jetpack versions.
I am mainly interested in Jetpack 5.1.1 since a lot of our products are flashed with this version. I don’t know how to enable the watchdog flag. I checked the thread under known issues 1, still I don’t know to fix this problem. Any instructions are appreciated.

Regards,
Farough

KevinFFF · March 15, 2024, 6:26am

Watchdog could be enabled through device tree.
Please share the full dmesg and also the device tree for further check.

farough · March 15, 2024, 3:22pm

Where can I find documentation on how to modify the device tree ? Do I need to re-flash the Jetson?
What change do I need to make to enable the watchdog?

Thanks

KevinFFF · March 18, 2024, 3:48am

Please use dtc tool to decompile the dtb in /boot/dtb/kernel_XXX.dtb to dts on your board.
After modify the device tree, please run dtc again to assemble it back to dtb and reboot the board to apply the change.

liu.junnan · June 13, 2025, 7:55am

Hi, @KevinFFF, i need help here.
I got similar issue， but can’t resolve it by modify the dtb file.
please have a look :

KevinFFF · June 16, 2025, 4:14am

This topic was created based on Jetpack 5.1.
I would suggest you updating to the latest Jetpack 5.1.5(r35.6.2) or Jetpack 6.2(r36.4.3) to verify since we have several fixes included in later releases.

Topic		Replies	Views
L4T 5.1 reboot loop after enabling watchdog with RootFS A/B Jetson Xavier NX nvbugs	23	2381	August 1, 2023
Need Help in Understanding Failover in RootFS A/B redundancy Jetson AGX Xavier security	13	2110	October 7, 2024
A/B ROOTFS Redundancy: Bootloader does not boot from backup slot when the working slot is intentionally corrupted Jetson AGX Xavier security , nvbugs	15	2309	March 24, 2023
Jetpack 5.1 Kernel Panic does not lead to reboot with A/B System Jetson Xavier NX boot , nvbugs	21	2741	March 20, 2023
Rootfs A/B not decreasing boot attempt counters Jetson Orin NX security	3	688	May 17, 2023
Xavier NX A/B Failover Jetson Xavier NX ota	8	1002	March 27, 2024
A/B Rootfs Redundancy: No fallback to other slot when one gets corrupted Jetson AGX Orin security	7	176	June 5, 2025
Jetpack 5.1 needs clarification from NVIDIA! Jetson Xavier NX boot , nvbugs	5	1149	April 25, 2023
A/B Redundancy support confirmation for 5.x Jetson Xavier NX security	2	507	March 24, 2023
ROOTFS_AB enable, but cannot reboot when A boot fail! Jetson Orin NX boot , security	26	414	July 1, 2025