Orin Nano AB Booting Not working

matt.read · July 25, 2024, 3:43pm

I Have a Orin Nano Devkit booting from the NVMe with AB booting enabled:

I have an APP and APP_b rootfs on NVMe p1 and p2:

I am able to change my active slot with the nvbootctrl command from slot a to b and when I reboot it boots to APP_b on nvme0n1p2 as expected.

I change my active slot back to slot a and reboot to revert back to the way I want it. I remove the NVMe and erase all of the APP partition nvme0n1p. I want to prove that AB boot redundancy works in a worse case scenario if APP partition gets corrupted. I insert it back in the Jetson but it seems to hang and never try’s to attempt to boot to the backup APP_b on nvme0n1p2 even after multiple power cycles.

I have been following the instructions from: Root File System — NVIDIA Jetson Linux Developer Guide 1 documentation and not sure if i’m missing something

JerryChang · July 26, 2024, 6:28am

hello matt.read,

you may also gather the complete serial console logs for understanding what happened.

matt.read · July 26, 2024, 10:16am

Hi Jerry,

Here is my serial log and also my flash logs. The Serial log shows the boot up when APP (nvme0np1) has been reformatted to an empty ext4 partition.
serial_boot_log.txt (69.3 KB)

full_flash_log.txt (331.3 KB)
initrdlog_flash_3-1_0_20240726-101731.txt (51.0 KB)

From the serial log there seems to be a Kernel panic

matt.read · July 26, 2024, 11:11am

Just to add onto my previous post I have left the UEFI settings as default:

kernel images are read from rootfs, and kernel DTB is read from the UEFI bootloader partition in QSPI

JerryChang · July 29, 2024, 6:26am

hello matt.read,

as you can see in the Rootfs Selection section.
Bootloader slot and Rootfs slot are configured together.

you’ve corrupted APP partition per your test approach,
so, you should also switch Bootloader B to be always boots with Rootfs B (APP_b).

matt.read · July 29, 2024, 8:50am

Hi Jerry

Thanks for your response, I am a little confused, what do you mean by “switch Bootloader B to be always boots with Rootfs B (APP_b)”. I thought if Boot slot A fails, Bootloader B to boot with Rootfs B (APP_b) is the default behavior after flashing with the “ROOTFS_AB=1” flag. Have I missed something in the AB boot setup?

After formatting the APP to be empty I am able to stop the kernel panic by going into the UEIF L4T setup and changing A to UNBOOTABLE:

The System then boots to boot slot b. The issue is I thought AB booting would detect the problem with boot slot a and then automatically switch to the other slot after the default 3 retry attempts?

JerryChang · July 31, 2024, 5:43am

hello matt.read,

BTW, we’re able to reproduce the same failure on Orin Nano/ Jetpack-6.0 GA/ l4t-r36.3.0
please configure boot option in UEFI to set RootFS slot-A as unbootable as temporary workaround.
let me arrange resources for investigation.

matt.read · July 31, 2024, 10:12am

Hi Jerry,

Thank you for your update regarding the reproduction of the same failure mode on the Orin Nano on your end.

I would like to emphasize that we utilise the Jetson Orin Nano in our remote embedded systems, which require robust boot redundancy. This feature is critical for ensuring the reliability and uptime of our deployments in various remote locations.

We are on the brink of a significant large-volume rollout to all our customers, making the timely resolution of this issue even more crucial for us. As such, we are keen to understand the timeline for when a patch addressing this failure will be available.

JerryChang · August 1, 2024, 3:24am

hello matt.read,

we’ve prove that root file system redundancy works on Jetpack-6.0 GA/ Orin Nano/ l4t-r36.3.0
here’re steps for verification.
(1) Flash r36.3 image on Orin-Nano with NVMe
$ sudo ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs -p "-c bootloader/generic/cfg/flash_t234_qspi.xml" --no-flash --network usb0 jetson-orin-nano-devkit internal
$ sudo ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs --no-flash --external-device nvme0n1p1 -c ./tools/kernel_flash/flash_l4t_t234_nvme_rootfs_ab.xml --external-only --append --network usb0 jetson-orin-nano-devkit external
$ sudo ./tools/kernel_flash/l4t_initrd_flash.sh --showlogs --network usb0 --flash-only

(2) Check Rootfs-A/B slots are available.
$ sudo nvbootctrl -t rootfs dump-slots-info
$ df -h

(3) Using below command to corrupt Rootfs-A.
$ sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=1

(4) Upon corrupted RootFS, you’ll see mount fail has reported, it’ll attempt to restart the system for another try.
for example,
[ 32.264604] ERROR: PARTUUID=9d4cc331-030a-44a2-b01a-50d969bf9965 mount fail...
[ 63.152102] Rebooting system...
[ 63.154226] sysrq: Resetting

(5) System shall retry for 3 times (its default settings), then switching the bootchain to APP_B and boot-up successfully.
for example,
UEFI will report below… Rebooting to new boot chain
you’ll see bootloader and kernel logs as following to switch to Rootfs-B.
I> Current Boot-Chain Slot: 1
I> BR-BCT Boot-Chain is 1, and status is 1. Set UPDATE_BRBCT bit to 0
[ 8.927066] EXT4-fs (nvme0n1p2): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[ 8.931756] Rootfs mounted over PARTUUID=4034b054-e6e9-4213-b901-adaa5a0fe99f
[ 8.941480] Switching from initrd to actual rootfs

(6) Double confirm the status with nvbootctrl
for example,

$ sudo nvbootctrl dump-slots-info
Current version: 36.3.0
Capsule update status: 0
Current bootloader slot: B
Active bootloader slot: B
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal

in short,
it turns out we had used incorrect command to simulate Rootfs-A corruption,
i.e. $ sudo rm -rf /*

matt.read · August 1, 2024, 9:02am

Hi Jerry, Thanks for the following instructions I will shortly test the command to overwrite the first 1 MB of the APP partition.

Would you know why the A/B redundancy doesn’t work for all scenarios of rootfs corruption i.e. $ sudo rm -rf /* or reformatting it to empty ext4. At the moment it kernel panics, is this still a bug and should it instead change boot slots or load the recovery kernel “NVIDIA recovery mode”.

Is it because the A/B system only works when damage is done to the filesystem to the extent that it will not be recognized as ext4 anymore, which is why overwriting the first 1 MB of the APP ext4 partition works: sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=1

JerryChang · August 2, 2024, 2:52am

hello matt.read,

there’s watch-dog to monitor the system, it warm reboot the system while watch-dog has timed out.
however, $ sudo rm -rf /* it also delete watch-dog when you intend to corrupt whole root file system.
in such scenario, even though system booting-up with kernel panic, a warm reset never triggered, the retry counter of rootfs slot-A did not consumed, that’s why it doesn’t make the rootfs switch to slot-B.
besides, if you using hardware reset button, it’s a cold reset, rootfs will not switch if you trigger a cold reset.

matt.read · August 2, 2024, 9:49am

Hi Jerry, I don’t fully understand,

so there is a software watch-dog that triggers a warm reset when it times out.
I am assuming its one of these devices:

You say it gets deleted if you run sudo rm -rf /*, surely running sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=1 would also corrupt/stop the watch-dog demon from also working?

With both commands above I would expect the same outcome. Watchdog demon stops writing to the watchdog device in /dev/ as rootfs is deleted/corrupted. Then the kernel would then trigger a warm reset due to it timing out, it would then keep doing this until it increments the retry counter to 3 and then changes boot slots.

I am still unsure why sudo rm -rf /* doesn’t trigger watchdog timout/warm reset but sudo dd if=/dev/zero of=/dev/nvme0n1p1 bs=1M count=1 does trigger it?

My guess would be sudo rm -rf /* doesn’t trigger watchdog timout/warm reset because is kennel panics before it gets the chance.

In short I need a solution/kernel patch that AB booting works for both corruption scenarios as I also need to protect our system from user errors such as sudo rm -rf /*.

JerryChang · August 5, 2024, 2:44am

hello matt.read,

the reboot logic happened when it has failed to mount the filesystem.
please see-also error logs by running $ sudo rm -rf /*
according to the logs, it looks like corrupted Rootfs-A was mounted successfully.

[    9.002800] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[    9.007463] Rootfs mounted over PARTUUID=42e116e1-0063-4aec-b1d8-22caf8e228e1
[    9.015057] Switching from initrd to actual rootfs
[    9.020886] Kernel panic - not syncing:
[    9.020890] Attempted to kill init! exitcode=0x00007f00
[    9.020895] CPU: 2 PID: 1 Comm: chroot Not tainted 5.15.136-tegra #1
[    9.020899] Hardware name: NVIDIA NVIDIA Jetson Orin Nano Developer Kit/Jetson, BIOS 36.3.0-gcid-36191598 05/06/2024
[    9.020901] Call trace:
[    9.020902]  dump_backtrace+0x0/0x1d0
[    9.020915]  show_stack+0x34/0x50
[    9.020919]  dump_stack_lvl+0x68/0x8c
[    9.020926]  dump_stack+0x18/0x3c
[    9.020929]  panic+0xc4/0x398
[    9.020932]  do_exit+0xa1c/0xa50
[    9.020936]  do_group_exit+0x44/0xb0
[    9.020939]  __arm64_sys_exit_group+0x2c/0x30
[    9.020942]  invoke_syscall+0x5c/0x150
[    9.020946]  el0_svc_common.constprop.0+0x64/0x120
[    9.020950]  do_el0_svc+0x74/0xb0
[    9.020953]  el0_svc+0x28/0x90
[    9.020955]  el0t_64_sync_handler+0xac/0x130
[    9.020957]  el0t_64_sync+0x1a4/0x1a8

matt.read · August 6, 2024, 8:14am

Hi Jerry, yes you are correct running sudo rm -rf /* does not corrupt the ext4 file system and that’s why it mounts successfully.

Are you saying that the Nvidia rootfs redundancy only works when the APP partition is corrupted (corrupted ext4) and fails to mount?

If the rootfs is unbootable due to missing key dependencies but still able to mount at boot up (NOT corrupted ext4) Nvidia rootfs redundancy doesn’t work and it will kernel panic instead?

JerryChang · August 6, 2024, 8:47am

hello matt.read,

FYI, we’ve tested with $ sudo rm -rf /* to simulate Rootfs-A corruption, and it’s able to restore/switch to Rootfs-B on r35.2.1 release version.

there’re lots of changes on JP-6.
currently, root file system redundancy only works when the APP partition is corrupted and fails to mount.

matt.read · August 6, 2024, 2:46pm

Hi Jerry, thanks for confirming that its an issue on JP6/ L4T r36.3.0 please would you be able to arrange resources for investigation into a patch for this, as this seems to be quite a big issue for A/B redundancy.

Topic		Replies	Views
ROOTFS_AB enable, but cannot reboot when A boot fail! Jetson Orin NX boot , security	26	458	July 1, 2025
A/B recovery/reset not working in Jetson Orin Nano Jetson Orin Nano boot , kernel , jetson	6	174	October 9, 2025
ROOTFS_AB does not switch to APP_B partition Jetson Orin Nano security	8	454	June 4, 2024
A/B ROOTFS Redundancy: Bootloader does not boot from backup slot when the working slot is intentionally corrupted Jetson AGX Xavier security , nvbugs	15	2328	March 24, 2023
How to trigger Fail-over Rootfs Slot Switching Jetson Orin Nano security	3	58	December 11, 2025
A/B Rootfs Redundancy: No fallback to other slot when one gets corrupted Jetson AGX Orin security	7	192	June 5, 2025
Orin Nano: Can't change/ select UEFI boot device Jetson Orin Nano boot	8	1121	October 9, 2023
Orin Nano apt get not working with AB Booting enabled Jetson Orin Nano security	11	233	October 10, 2024
How to resolve rootfs AB problem on Jetson Orin NX Jetson Orin Nano security	7	79	January 19, 2026
Need Help in Understanding Failover in RootFS A/B redundancy Jetson AGX Xavier security	13	2185	October 7, 2024

Orin Nano AB Booting Not working

Related topics