Bootloader does not fall-back to slot A when Slot B can't boot (rootfs A/B)

We are experimenting with the rootfs A/B to implement our own image update process.

We plan to use the APT approach (NOT the image approach), for that reason our plan is to

  • flash two partitions with the OS (rootfs A/B)
  • e.g. mount the rootfs of the non-current slot X and make changes on it via chroot
  • select the changed slot for booting (nvbootctrl set-active-boot-slot X) and reboot

All of the above worked so far, but when we test the mechanism to fallback to slot Y if slot X is not bootable, the xavier hangs and no fallback will be done no matter how long we wait.

To make slot X unbootable we replaced the file /boot/Image once with an empty file and once with a link to a valid Image file. When the empty file was in place we hang at the nvidia splashscreen, with the link we had a black-screen with not splash whatsoever. So it seems the bootloader does not detect a problem and/or does not fallback to slot Y.

We are using the latest L4T 32.6 tarball and flashed the board via ROOTFS_AB=1 ./nvflash.sh jetson-xavier mmcblk0p1

Here is our smd file from bootloader/smd_info.rootfs_AB.cfg. Note that we adjusted MAX_BL_RETRY_COUNT and MAX_ROOTFS_AB_RETRY_COUNT to only one allowed failure. That was to reduce wait times in case we had to wait for the bootloader to detect a unsuccessfull boot.

# SMD metadata information
< VERSION 5 >
# Set the maximum boot slot retry count
# Please make sure this field is set before slot info config
# The valid setting is 1 to 7
< MAX_BL_RETRY_COUNT 1 >

# Set the maximum rootfs slot retry count
# Please make sure this field is set before slot info config
# The valid setting is 1 to 3
< MAX_ROOTFS_AB_RETRY_COUNT 1 >

#
# Config 1: Disable A/B support (by removing comments ##)
#

# slot info order is important!
# <priority>    <suffix>  <boot_successful>
##15                  _a        1

#
# Config 2: Enable rootfs A/B support (default)
#
< REDUNDANCY_ENABLE 1 >
< ROOTFS_AB 1 >

# To enable rootfs autosync, use < RF_AUTOSYNC_ENABLE 1 >
# This option must be defined after "< ROOTFS_AB 1 >"
##< RF_AUTOSYNC_ENABLE 1 >

# Select rootfs A as the active rootfs
< ROOTFS_ACTIVE_A 1 >
##< ROOTFS_ACTIVE_B 1 >

# Enable/disable unified bootloader AB and rootfs AB
# Set 1 to enable, set 0 to disable. Default is enabled.
# This option must be defined after "< ROOTFS_AB 1 >"
# When < ROOTFS_BL_UNIFIED_AB 1 > is set,
# auto sync for both BL and RF are disabled.
< ROOTFS_BL_UNIFIED_AB 1 >

# To disable bootloader autosync, use < BL_AUTOSYNC_DISABLE 1 >, default is disabled.
# REDUNDANCY_ENABLE or REDUNDANCY_USER must be defined before BL_AUTOSYNC_DISABLE !
< BL_AUTOSYNC_DISABLE 1 >

# slot info order is important!
# <priority>    <suffix>  <boot_successful>
15                  _a        1
14                  _b        1

Those adjustments were applied via running ./nv_smd_generator smd_info.rootfs_AB.cfg slot_metadata.bin.rootfsAB

Are there other settings needed for our simple test to succeed?

Hi,
Please refer to this page and get the uart log for reference:
Jetson/General debug - eLinux.org

Do you observe the issue on Xavier developer kit or custom board?

@DaneLLL okay, we will try to get the log via UART, but this could take some days since it seems we need a special hardware for this.

We were observing this on a “custom” board, its the “stevie-xavier” from diamond systems (STEVIE™ carrier for NVIDIA Jetson AGX Xavier). But we will also check this out on the xavier devkit again.

Just to have this confirmed: we do not miss any step in our description above? So this should work on a xavier devkit how we do it?