Redundant A/B rootfs does not switch automatically after OTA

Hi,
We use Jetson AGX Xavier. The L4T version is R35.6.0. We design our carrier board not devkit.

When active boot slot is A.

root@nvidia:/program/disk# nvbootctrl get-current-slot
0
root@nvidia:/program/disk# nvbootctrl -t bootloader dump-slots-info
Current version: 35.6.0
Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal
root@nvidia:/program/disk# nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: A
Active rootfs slot: A
num_slots: 2
slot: 0,             retry_count: 3,             status: normal
slot: 1,             retry_count: 3,             status: normal

We use nv_ota_start.sh script to update slotB.

cd ${WORKDIR}/Linux_for_Tegra/tools/ota_tools/version_upgrade
./nv_ota_start.sh ${OTA_PAYLOAD}

After the upgrade was completed, we use reboot restarted the device.

reboot

Theoretically, after the upgrade completes and the system reboots following nv_ota_start.sh execution, the slot will automatically switch from slotA to slotB.
However, the boot slot did not switch automatically.

root@nvidia:/home/nvidia# nvbootctrl get-current-slot
0
root@nvidia:/home/nvidia# nvbootctrl -t bootloader dump-slots-info
Current version: 35.6.0
Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal
root@nvidia:/home/nvidia# nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: A
Active rootfs slot: A
num_slots: 2
slot: 0,             retry_count: 3,             status: normal
slot: 1,             retry_count: 3,             status: normal
root@nvidia:/home/nvidia# mount | grep mmcblk0
/dev/mmcblk0p1 on / type ext4 (rw,relatime)

Why did this anomaly occur?
File slotA_ota_slotB.txt below is the OTA upgrade log. File slotA_ota_slotB_dmesg.txt is the kernel dmesg log.
slotA_ota_slotB.txt (19.4 KB)
slotA_ota_slotB_dmesg.txt (92.5 KB)

Through the logs, I discovered the following anomalies:

[  234.492352] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  234.492352] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  234.618883] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  234.618883] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  285.600374] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
[  382.828290] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  382.828290] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  383.194220] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  383.194220] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.

/dev/mmcblk0p44 is esp(EFI system partition with L4T Launcher).
Is there some error in esp partition?
Please help us!

Through the post(Redundant A/B rootfs not switching with set-active-boot-slot but working with set-SR-BR), we discovered a bug in UEFI.
Therefore, we have followed the post’s recommendation and upgraded UEFI to the latest version (Jetson UEFI firmware (version 202210.5-c101ba51-dirty built on 2025-05-20T11:02:23+08:00)).

The reproduction method is the same as mentioned in Post(Redundant A/B rootfs not switching with set-active-boot-slot but working with set-SR-BR - #20 by diogojusten): repeatedly reboot the device multiple times (e.g., 500 times), then execute the script nv_ota_start.sh for OTA upgrade. After that, the slot will fail to switch automatically.

hello newbie.lei,

may I have details of your OTA update process.
could you please setup serial console to gather the complete UART logs for reference.

BTW,
it was Topic 308540 regrading UEFI memory leak and partition full.
let me re-cap the error as below..
ASSERT [VariableRuntimeDxe] /out/nvidia/bootloader/uefi/Jetson_RELEASE/edk2/MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c(3264): !(((INTN)(RETURN_STATUS)(Status)) < 0)

Our OTA update process is as same as Steps Performed on the Jetson Device.
Here are our detailed steps:

OTA_BASE_DIR="/program/disk"
OTA_TOOL=${OTA_BASE_DIR}/"ota_tools_R35.6.0_aarch64.tbz2"
OTA_PAYLOAD=${OTA_BASE_DIR}/"ota_payload_package.tar.gz"

WORKDIR=${OTA_BASE_DIR}/workdir
mkdir -p ${WORKDIR}
export WORKDIR=${WORKDIR}

tar -xf ${OTA_TOOL} -C ${WORKDIR}
cd ${WORKDIR}/Linux_for_Tegra/tools/ota_tools/version_upgrade
./nv_ota_start.sh ${OTA_PAYLOAD}

After nv_ota_start.sh is finished and no error occurred, we reboot the target board.

The complete UART logs has already been uploaded earlier.
Please see
slotA_ota_slotB_dmesg.txt (92.5 KB)

Another suspicious point is that after the nv_ota_start.sh upgrade completes and reboot, the UEFI enters twice. The first time, it reports the following exception print:

Jetson UEFI firmware (version 202210.5-c101ba51-dirty built on 2025-05-20T11:02:
23+08:00)
ESC   to enter Setup.
F11   to enter Boot Manager Menu.
Enter to continue boot.
▒▒▒▒Shutdown state requested 1
Rebooting system ...
▒▒
[0000.055] W> RATCHET: MB1 binary ratchet value 4 is larger than ratchet level 2 from HW fuses.
[0000.063] I> MB1 (prd-version: 2.6.0.0-t194-41334769-cab45716)

May I ask if this is an exception?
Detailed logs can be found in the attachment slotA_ota_slotB_dmesg.txt.

it looks it’s not even start the OTA update progress.

FYI, Image-based OTA update including updating rootfs and updating bootloader. Updating rootfs is before updating the bootloader; Updating the bootloader is executed in UEFI. once updating rootfs is finished, the device reboots, and then UEFI updates the bootloader through UEFI capsule update.

let me re-cap part of rootfs update logs for your reference..

[    9.306751] Finding OTA work dir on external storage devices
Looking for OTA work directory on the device(s): /dev/mmcblk0p1
[    9.315908] mount /dev/mmcblk0p1 /mnt
[    9.344126] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[    9.346019] is_boot_part_for_disk_enc /dev/mmcblk0p1 /mnt
[    9.365730] Set rootfs=/dev/mmcblk0p1
[    9.367231] Set dm_crypt=
/mnt/ota_work /
Create log file at /mnt/ota_log/ota_19700101-000009.log
[    9.373085] init_ota_log /mnt/ota_log
[    9.377734] OTA_LOG_FILE=/mnt/ota_log/ota_19700101-000009.log
[    9.379120] init_exception_handler /mnt /mnt/ota_log/ota_19700101-000009.log 0
[    9.381276] Running nv_ota_validate.sh
...

anyways, these logs looks suspicious.
could you please try fsck to check/repair your file system.

[  234.492352] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  234.492352] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[  234.618883] FAT-fs (mmcblk0p44): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.

Thank you for your reply.
The rootfs upgrade appears to have succeeded. When I attempted to mount /dev/mmcblk0p2 (slotB’s rootfs), I found that all changes I made to the rootfs had been successfully applied.

The current issue seems to be a failed BootLoader update under UEFI?

When I use fsck check /dev/mmcblk0p44, here is the log:

root@nvidia:/home/nvidia# fsck.vfat -nv /dev/mmcblk0p44
fsck.fat 4.1 (2017-01-24)
Checking we can access the last sector of the filesystem
0x41: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
 Automatically removing dirty bit.
Boot sector contents:
System ID "mkfs.fat"
Media byte 0xf8 (hard disk)
       512 bytes per logical sector
       512 bytes per cluster
        32 reserved sectors
First FAT starts at byte 16384 (sector 32)
         2 FATs, 32 bit entries
    516608 bytes per FAT (= 1009 sectors)
Root directory start at cluster 2 (arbitrary size)
Data area starts at byte 1049600 (sector 2050)
    129022 data clusters (66059264 bytes)
32 sectors/track, 64 heads
         0 hidden sectors
    131072 sectors total
Checking for unused clusters.
Checking free cluster summary.
Leaving filesystem unchanged.
/dev/mmcblk0p44: 6 files, 183/129022 clusters

How should we resolve a failed BootLoader update under UEFI?

We used fsck to repair the damaged partition /dev/mmcblk0p44, but the device still fails to automatically switch slots after OTA updates.
Please help us.

hello newbie.lei,

did the corrupt warning gone? may I know the latest UART logs that you perform OTA update?

Yes, the corrupt warning no longer appears.
Below is the complete log from slotA OTA to slotB. Please help analyze it.
[com COM26] (2025-09-04_151623) 绿联232.log (285.7 KB)

[2025-09-04 15:16:23] We start machine
[2025-09-04 15:17:59] We exec nv_ota_start.sh
[2025-09-04 15:22:10] nv_ota_start.sh finish. We reboot the machine
[2025-09-04 15:23:50] machine reboot finish. We check slot information. But it still in slotA.

The bootloader still doesn’t seem to have been upgraded.

The UEFI enter twice.
The first enter print

[2025-09-04 15:22:35]  
[2025-09-04 15:22:35]  Jetson UEFI firmware (version 202210.5-c101ba51-dirty built on 2025-05-20T11:02:
[2025-09-04 15:22:37]  23+08:00)
[2025-09-04 15:22:37]  ESC   to enter Setup.
[2025-09-04 15:22:37]  F11   to enter Boot Manager Menu.
[2025-09-04 15:22:37]  Enter to continue boot.
[2025-09-04 15:22:37]  Shutdown state requested 1
[2025-09-04 15:22:38]  Rebooting system ...
[2025-09-04 15:22:38]  
[2025-09-04 15:22:38]  [0000.055] W> RATCHET: MB1 binary ratchet value 4 is larger than ratchet level 2 from HW fuses.

What does printing Shutdown state requested 1 mean?

hello newbie.lei,

it should be the 1st UEFI to update the bootloader, and then send request to restart the system.
could you please check developer guide, Generating the Capsule Update Payload to test again.

Hi,
We try Generating the Capsule Update Payload.
The Bootloader can be successfully upgraded.

Jetson UEFI firmware (version 202210.5-c101ba51-dirty built on 2025-05-20T11:02:
23+08:00)
ESC   to enter Setup.
F11   to enter Boot Manager Menu.
Enter to continue boot.

Update Progress - 100% **************************************************▒▒▒▒Shutdown state requested 1
Rebooting system ...
▒▒

However, the previously enabled AB redundancy function is now disabled.

root@nvidia:/home/nvidia# nvbootctrl get-current-slot
0
root@nvidia:/home/nvidia# nvbootctrl -t bootloader dump-slots-info
Current version: 35.6.0
Capsule update status: 1
Current bootloader slot: A
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal
root@nvidia:/home/nvidia# nvbootctrl -t rootfs dump-slots-info
RootFS A/B is not enabled.

Before we exec Generating the Capsule Update Payload, AB redundancy function is enabled :

root@nvidia:/program/disk# nvbootctrl get-current-slot
0
root@nvidia:/program/disk# nvbootctrl -t bootloader dump-slots-info
Current version: 35.6.0
Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal
root@nvidia:/program/disk# nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: A
Active rootfs slot: A
num_slots: 2
slot: 0,             retry_count: 3,             status: normal
slot: 1,             retry_count: 3,             status: normal

This still doesn’t seem to meet our expectations.
We expect: AB redundancy, and support OTA updates.

hello newbie.lei,

rootfs-A/B might be disabled due to the version mismatch between bootloader and rootfs after capsule update.
you may execute the following command to check before and after update.
# xxd /sys/firmware/efi/efivars/RootfsRedundancyLevel-781e084c-a330-417c-b678-38e696380cb9

Yes,
before update:

xxd /sys/firmware/efi/efivars/RootfsRedundancyLevel-781e084c-a330-417c-b678-38e696380cb9
00000000: 0600 0000 0100 0000                      ........

Upgrade according to the Generating the Capsule Update Payload

After update:

xxd /sys/firmware/efi/efivars/RootfsRedundancyLevel-781e084c-a330-417c-b678-38e696380cb9
00000000: 0600 0000 0000 0000                      ........

Why the version mismatch between bootloader and rootfs after capsule update ?

Our primary objective is to resolve the issue where the AB redundancy fails to automatically switch slots after an OTA update.
Why? And how to resolve it?
We have successfully reproduced this issue on multiple machines, and it will impact critical functionality of our product. Please help us resolve this.

hello newbie.lei,

please also share your commands for creating OTA payload.

Hi, our commands for creating OTA payload are:

cd /NVME/Jetson/ADOB
export BASE_BSP=/NVME/Jetson/ADOB/Linux_for_Tegra
export TARGET_BSP=/NVME/Jetson/ADOB/Linux_for_Tegra
cd ${TARGET_BSP}/../
sudo tar xpf ota_tools_R35.6.0_aarch64.tbz2
cd ${TARGET_BSP}
sudo -E ROOTFS_AB=1 ./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh uih_adob R35-6

Additionally, we had already successfully flashed the device using flash.sh before creating the OTA payload.
We referenced Document Steps Performed on the Host Machine for the creating steps.

We also attempted to upgrade the ESP partition during the OTA update using the following command:

sudo -E ROOTFS_AB=1 ./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh -E ${TARGET_BSP}/bootloader/esp.img uih_adob R35-6

However, after the OTA update completed, it still failed to automatically switch partitions (because the BootLoader was not upgraded).

hello newbie.lei,

it’s suggest to update l4t_generate_ota_package.sh and ota_board_specs.conf for customize board to create OTA payloads.
please see-also Topic 332980 for reference.