Bootloader does not fall-back to slot A when Slot B can't boot (rootfs A/B)

brootux · January 14, 2022, 1:45pm

We are experimenting with the rootfs A/B to implement our own image update process.

We plan to use the APT approach (NOT the image approach), for that reason our plan is to

flash two partitions with the OS (rootfs A/B)
e.g. mount the rootfs of the non-current slot X and make changes on it via chroot
select the changed slot for booting (nvbootctrl set-active-boot-slot X) and reboot

All of the above worked so far, but when we test the mechanism to fallback to slot Y if slot X is not bootable, the xavier hangs and no fallback will be done no matter how long we wait.

To make slot X unbootable we replaced the file /boot/Image once with an empty file and once with a link to a valid Image file. When the empty file was in place we hang at the nvidia splashscreen, with the link we had a black-screen with not splash whatsoever. So it seems the bootloader does not detect a problem and/or does not fallback to slot Y.

We are using the latest L4T 32.6 tarball and flashed the board via ROOTFS_AB=1 ./nvflash.sh jetson-xavier mmcblk0p1

Here is our smd file from bootloader/smd_info.rootfs_AB.cfg. Note that we adjusted MAX_BL_RETRY_COUNT and MAX_ROOTFS_AB_RETRY_COUNT to only one allowed failure. That was to reduce wait times in case we had to wait for the bootloader to detect a unsuccessfull boot.

# SMD metadata information
< VERSION 5 >
# Set the maximum boot slot retry count
# Please make sure this field is set before slot info config
# The valid setting is 1 to 7
< MAX_BL_RETRY_COUNT 1 >

# Set the maximum rootfs slot retry count
# Please make sure this field is set before slot info config
# The valid setting is 1 to 3
< MAX_ROOTFS_AB_RETRY_COUNT 1 >

#
# Config 1: Disable A/B support (by removing comments ##)
#

# slot info order is important!
# <priority>    <suffix>  <boot_successful>
##15                  _a        1

#
# Config 2: Enable rootfs A/B support (default)
#
< REDUNDANCY_ENABLE 1 >
< ROOTFS_AB 1 >

# To enable rootfs autosync, use < RF_AUTOSYNC_ENABLE 1 >
# This option must be defined after "< ROOTFS_AB 1 >"
##< RF_AUTOSYNC_ENABLE 1 >

# Select rootfs A as the active rootfs
< ROOTFS_ACTIVE_A 1 >
##< ROOTFS_ACTIVE_B 1 >

# Enable/disable unified bootloader AB and rootfs AB
# Set 1 to enable, set 0 to disable. Default is enabled.
# This option must be defined after "< ROOTFS_AB 1 >"
# When < ROOTFS_BL_UNIFIED_AB 1 > is set,
# auto sync for both BL and RF are disabled.
< ROOTFS_BL_UNIFIED_AB 1 >

# To disable bootloader autosync, use < BL_AUTOSYNC_DISABLE 1 >, default is disabled.
# REDUNDANCY_ENABLE or REDUNDANCY_USER must be defined before BL_AUTOSYNC_DISABLE !
< BL_AUTOSYNC_DISABLE 1 >

# slot info order is important!
# <priority>    <suffix>  <boot_successful>
15                  _a        1
14                  _b        1

Those adjustments were applied via running ./nv_smd_generator smd_info.rootfs_AB.cfg slot_metadata.bin.rootfsAB

Are there other settings needed for our simple test to succeed?

DaneLLL · January 17, 2022, 2:23am

Hi,
Please refer to this page and get the uart log for reference:
Jetson/General debug - eLinux.org

Do you observe the issue on Xavier developer kit or custom board?

brootux · January 17, 2022, 7:33am

@DaneLLL okay, we will try to get the log via UART, but this could take some days since it seems we need a special hardware for this.

We were observing this on a “custom” board, its the “stevie-xavier” from diamond systems (STEVIE Carrier and Dev Kit for NVIDIA Jetson AGX Xavier). But we will also check this out on the xavier devkit again.

Just to have this confirmed: we do not miss any step in our description above? So this should work on a xavier devkit how we do it?

JerryChang · January 19, 2022, 2:34am

hello brootux,

the logic is that if the retry_count reaches 0, then the CBoot will select another RootFS slot.
it only check and boot from next slot when retry count < 0; the default value of Rootfs_Retry_Count is 3.

there’s a background service, nv_update_verifier.service; in this service, it will trigger the l4t-rootfs-validation-config.service first, it provides an interface to users to customize when to say the boot is successful. If the validation script doesn’t exist or returns true, that means the rootfs boot up successful.
if the rootfs validation is true, then the nv_update_verifier.service will run /usr/sbin/nv_update_engine --verify, the nv_update_engine will increase the retry_count and update slot status.

here’s see-also topic for your reference, Topic 197124.
thanks

brootux · January 19, 2022, 7:43am

Hello JerryChang,

thanks for this on-point summary. We already read a lot about the nv_update_verifier.service before, but this whole mechanism will only run after a kernel was booted succesfully. We plan to use it to verify that our deployed application(s) are fine, but its not useful to guarantee changes to e.g. Bootloader, Kernel, Devicetree, Rootfs are fine.

So what is described in Topic 197124 means that the current bootloader in 32.6 is not capable for what we are testing currently, right?

Only in 32.7 it will be fixed and the bootloader will count-down the retries on a failed boot, is that right?

JerryChang · January 19, 2022, 8:53am

hello brootux,

you’re talking about bootloader redundancy and also rootfs redundancy.
may I know what’s your test procedure, did you crash slot-b intentionally and force it boot into slot-b for verification?

brootux · January 19, 2022, 9:05am

Hello Jerry Chang,

yes there is a detailed description on top of this thread. Here is a short summary:

We have unified bootloader (our understanding of this is we have two slots which have its own BL and rootfs)
We flashed both slots with the same image via ROOTFS_AB=1 ./flash.sh ...
We crashed slot-b by replacing /boot/Image in slot-b-rootfs with an empty file
We force it to boot from slot-b via nvbootctrl set-active-boot-slot 1

From what we can observe (only monitor attached) is that the bootloader does not retry booting slot-b and also does not fall-back to slot-a which is untouched and should boot.

JerryChang · January 19, 2022, 9:35am

hello brootux,

this is an incorrect test steps, this is the cboot loading the kernel image via file system,
please refer to CBoot session, it’s [Kernel Boot Sequence Using extlinux.conf] to load the kernel binary file from the LINUX entry, otherwise, the kernel binary is loaded from the kernel partition.

so, the correct test steps should removing the LINUX entry, and loading the kernel via partitions. you may examine all the partitions as following, i.e. $ ls -al /dev/disk/by-partlabel, you should use the dd commands to crash the partition. reboot the system and check the bootloader logs, it’ll have 7-time retry (it’s bootloader side default retry counts) and finally boot into another slot for booting-up.

brootux · January 21, 2022, 12:35pm

Hello JerryChang,

thanks for the insight, we now flashed again and tried crashing the partition by writing all zeroes and removed the LINUX entry. With this test we realized that when we are resetting the board by hand 7 times, we ran into the fallback. So it seems the problem was that we were assuming the bootloader will reboot automatically if kernel couldn’t be loaded.

Is there any timeout which reboots the board when a kernel takes too long to load or does not work at all?

JerryChang · January 24, 2022, 2:21am

hello brootux,

you may dig into bootloader logs, please setup the serial console via port J501.
this retry counts should works (reduce the retry times, reload the binaries automatically) by itself, please also share the logs for reference,
thanks

system · February 23, 2022, 3:01am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mark bootloader bootable Jetson Xavier NX security	27	2361	November 29, 2022
Need Help in Understanding Failover in RootFS A/B redundancy Jetson AGX Xavier security	13	1787	October 7, 2024
L4T 35.1.0 nvbootctrl mark as bootable Jetson Xavier NX boot , nvbugs	17	2109	September 20, 2022
Rootfs A/B redundancy fail-over mechanism in Jetpack5.1 Jetson Xavier NX kb	15	5020	June 16, 2025
A/B ROOTFS Redundancy: Bootloader does not boot from backup slot when the working slot is intentionally corrupted Jetson AGX Xavier security , nvbugs	15	2092	March 24, 2023
Xavier AGX with redundant A/B rootfs reboots spontaneously on slot 1 instead of 0 on reboot Jetson AGX Xavier boot	5	1547	January 24, 2022
A/B Rootfs Redundancy: No fallback to other slot when one gets corrupted Jetson AGX Orin security	6	27	June 5, 2025
A/B boot failure after cloning partition using flash.sh Jetson AGX Xavier boot , security	4	34	May 7, 2025
Wrong A/B boot Jetson AGX Xavier security , nvbugs	4	969	April 8, 2022
Failed Bootloader Watchdog Recovery Jetson Xavier NX boot	3	755	October 18, 2021

Bootloader does not fall-back to slot A when Slot B can't boot (rootfs A/B)

Related topics