I Have a Orin Nano Devkit booting from the NVMe with AB booting enabled:
I have an APP and APP_b rootfs on NVMe p1 and p2:
I am able to change my active slot with the
nvbootctrl command from slot
a to
b and when I reboot it boots to APP_b on nvme0n1p2 as expected.
I change my active slot back to slot a and reboot to revert back to the way I want it. I remove the NVMe and erase all of the APP partition nvme0n1p. I want to prove that AB boot redundancy works in a worse case scenario if APP partition gets corrupted. I insert it back in the Jetson but it seems to hang and never try’s to attempt to boot to the backup APP_b on nvme0n1p2 even after multiple power cycles.
I have been following the instructions from: Root File System — NVIDIA Jetson Linux Developer Guide 1 documentation and not sure if i’m missing something
hello matt.read,
you may also gather the complete serial console logs for understanding what happened.
Hi Jerry,
Here is my serial log and also my flash logs. The Serial log shows the boot up when APP (nvme0np1) has been reformatted to an empty ext4 partition.
serial_boot_log.txt (69.3 KB)
full_flash_log.txt (331.3 KB)
initrdlog_flash_3-1_0_20240726-101731.txt (51.0 KB)
From the serial log there seems to be a Kernel panic
Just to add onto my previous post I have left the UEFI settings as default:
kernel images are read from rootfs, and kernel DTB is read from the UEFI bootloader partition in QSPI
hello matt.read,
as you can see in the Rootfs Selection section.
Bootloader slot and Rootfs slot are configured together.
you’ve corrupted APP partition per your test approach,
so, you should also switch Bootloader B to be always boots with Rootfs B (APP_b).
Hi Jerry
Thanks for your response, I am a little confused, what do you mean by “switch Bootloader B to be always boots with Rootfs B (APP_b)”. I thought if Boot slot A fails, Bootloader B to boot with Rootfs B (APP_b) is the default behavior after flashing with the “ROOTFS_AB=1” flag. Have I missed something in the AB boot setup?
After formatting the APP to be empty I am able to stop the kernel panic by going into the UEIF L4T setup and changing A to UNBOOTABLE:

The System then boots to boot slot b. The issue is I thought AB booting would detect the problem with boot slot a and then automatically switch to the other slot after the default 3 retry attempts?
hello matt.read,
BTW, we’re able to reproduce the same failure on Orin Nano/ Jetpack-6.0 GA/ l4t-r36.3.0
please configure boot option in UEFI to set RootFS slot-A as unbootable as temporary workaround.
let me arrange resources for investigation.
Hi Jerry,
Thank you for your update regarding the reproduction of the same failure mode on the Orin Nano on your end.
I would like to emphasize that we utilise the Jetson Orin Nano in our remote embedded systems, which require robust boot redundancy. This feature is critical for ensuring the reliability and uptime of our deployments in various remote locations.
We are on the brink of a significant large-volume rollout to all our customers, making the timely resolution of this issue even more crucial for us. As such, we are keen to understand the timeline for when a patch addressing this failure will be available.
hello matt.read,
we’ve prove that root file system redundancy works on Jetpack-6.0 GA/ Orin Nano/ l4t-r36.3.0
here’re steps for verification.
(1) Flash r36.3 image on Orin-Nano with NVMe
$ sudo ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs -p "-c bootloader/generic/cfg/flash_t234_qspi.xml" --no-flash --network usb0 jetson-orin-nano-devkit internal
$ sudo ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs --no-flash --external-device nvme0n1p1 -c ./tools/kernel_flash/flash_l4t_t234_nvme_rootfs_ab.xml --external-only --append --network usb0 jetson-orin-nano-devkit external
$ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs --network usb0 --flash-only
(2) Check Rootfs-A/B slots are available.
$ sudo nvbootctrl -t rootfs dump-slots-info
$ df -h
(3) Using below command to corrupt Rootfs-A.
$ sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=1
(4) Upon corrupted RootFS, you’ll see mount fail has reported, it’ll attempt to restart the system for another try.
for example,
[ 32.264604] ERROR: PARTUUID=9d4cc331-030a-44a2-b01a-50d969bf9965 mount fail...
[ 63.152102] Rebooting system...
[ 63.154226] sysrq: Resetting
(5) System shall retry for 3 times (its default settings), then switching the bootchain to APP_B and boot-up successfully.
for example,
UEFI will report below… Rebooting to new boot chain
you’ll see bootloader and kernel logs as following to switch to Rootfs-B.
I> Current Boot-Chain Slot: 1
I> BR-BCT Boot-Chain is 1, and status is 1. Set UPDATE_BRBCT bit to 0
[ 8.927066] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 8.931756] Rootfs mounted over PARTUUID=4034b054-e6e9-4213-b901-adaa5a0fe99f
[ 8.941480] Switching from initrd to actual rootfs
(6) Double confirm the status with nvbootctrl
for example,
$ sudo nvbootctrl dump-slots-info
Current version: 36.3.0
Capsule update status: 0
Current bootloader slot: B
Active bootloader slot: B
num_slots: 2
slot: 0, status: normal
slot: 1, status: normal
in short,
it turns out we had used incorrect command to simulate Rootfs-A corruption,
i.e. $ sudo rm -rf /*
Hi Jerry, Thanks for the following instructions I will shortly test the command to overwrite the first 1 MB of the APP partition.
Would you know why the A/B redundancy doesn’t work for all scenarios of rootfs corruption i.e. $ sudo rm -rf /* or reformatting it to empty ext4. At the moment it kernel panics, is this still a bug and should it instead change boot slots or load the recovery kernel “NVIDIA recovery mode”.
Is it because the A/B system only works when damage is done to the filesystem to the extent that it will not be recognized as ext4 anymore, which is why overwriting the first 1 MB of the APP ext4 partition works: sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=1
hello matt.read,
there’s watch-dog to monitor the system, it warm reboot the system while watch-dog has timed out.
however, $ sudo rm -rf /* it also delete watch-dog when you intend to corrupt whole root file system.
in such scenario, even though system booting-up with kernel panic, a warm reset never triggered, the retry counter of rootfs slot-A did not consumed, that’s why it doesn’t make the rootfs switch to slot-B.
besides, if you using hardware reset button, it’s a cold reset, rootfs will not switch if you trigger a cold reset.
Hi Jerry, I don’t fully understand,
so there is a software watch-dog that triggers a warm reset when it times out.
I am assuming its one of these devices:

You say it gets deleted if you run sudo rm -rf /*, surely running sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=1 would also corrupt/stop the watch-dog demon from also working?
With both commands above I would expect the same outcome. Watchdog demon stops writing to the watchdog device in /dev/ as rootfs is deleted/corrupted. Then the kernel would then trigger a warm reset due to it timing out, it would then keep doing this until it increments the retry counter to 3 and then changes boot slots.
I am still unsure why sudo rm -rf /* doesn’t trigger watchdog timout/warm reset but sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=1 does trigger it?
My guess would be sudo rm -rf /* doesn’t trigger watchdog timout/warm reset because is kennel panics before it gets the chance.
In short I need a solution/kernel patch that AB booting works for both corruption scenarios as I also need to protect our system from user errors such as sudo rm -rf /*.
hello matt.read,
the reboot logic happened when it has failed to mount the filesystem.
please see-also error logs by running $ sudo rm -rf /*
according to the logs, it looks like corrupted Rootfs-A was mounted successfully.
[ 9.002800] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 9.007463] Rootfs mounted over PARTUUID=42e116e1-0063-4aec-b1d8-22caf8e228e1
[ 9.015057] Switching from initrd to actual rootfs
[ 9.020886] Kernel panic - not syncing:
[ 9.020890] Attempted to kill init! exitcode=0x00007f00
[ 9.020895] CPU: 2 PID: 1 Comm: chroot Not tainted 5.15.136-tegra #1
[ 9.020899] Hardware name: NVIDIA NVIDIA Jetson Orin Nano Developer Kit/Jetson, BIOS 36.3.0-gcid-36191598 05/06/2024
[ 9.020901] Call trace:
[ 9.020902] dump_backtrace+0x0/0x1d0
[ 9.020915] show_stack+0x34/0x50
[ 9.020919] dump_stack_lvl+0x68/0x8c
[ 9.020926] dump_stack+0x18/0x3c
[ 9.020929] panic+0xc4/0x398
[ 9.020932] do_exit+0xa1c/0xa50
[ 9.020936] do_group_exit+0x44/0xb0
[ 9.020939] __arm64_sys_exit_group+0x2c/0x30
[ 9.020942] invoke_syscall+0x5c/0x150
[ 9.020946] el0_svc_common.constprop.0+0x64/0x120
[ 9.020950] do_el0_svc+0x74/0xb0
[ 9.020953] el0_svc+0x28/0x90
[ 9.020955] el0t_64_sync_handler+0xac/0x130
[ 9.020957] el0t_64_sync+0x1a4/0x1a8
Hi Jerry, yes you are correct running sudo rm -rf /* does not corrupt the ext4 file system and that’s why it mounts successfully.
Are you saying that the Nvidia rootfs redundancy only works when the APP partition is corrupted (corrupted ext4) and fails to mount?
If the rootfs is unbootable due to missing key dependencies but still able to mount at boot up (NOT corrupted ext4) Nvidia rootfs redundancy doesn’t work and it will kernel panic instead?
hello matt.read,
FYI, we’ve tested with $ sudo rm -rf /* to simulate Rootfs-A corruption, and it’s able to restore/switch to Rootfs-B on r35.2.1 release version.
there’re lots of changes on JP-6.
currently, root file system redundancy only works when the APP partition is corrupted and fails to mount.
Hi Jerry, thanks for confirming that its an issue on JP6/ L4T r36.3.0 please would you be able to arrange resources for investigation into a patch for this, as this seems to be quite a big issue for A/B redundancy.