UUID A/B Bank Consistency When Updating TX2 using CBoot

I have a Jetson TX2 setup where I have A/B rootfs enabled (ROOTFS_AB=1) and am using cboot as the primary Linux bootloader (USE_UBOOT=0). The Nvidia build tools (flash.sh) generates kernel images with the rootfs partition UUIDs baked directly into their command-line parameters. In this scenario, the system executes cboot for a particular boot chain (A or B) which then execs the image from the matching kernel partition. That kernel image is then able to find the corresponding rootfs based off of the UUID in the GPT header.

If I want to do OTA updates on this system, I need the kernel images to have consistent UUIDs to always match the A/B rootfs banks. Even if I keep those values constant in my build system, though, the Nvidia update generation tools (l4t_generate_ota_package.sh, BUP_generator.py, etc) always build the update artifact using the bank-A kernel image. This results in updates that always uses the Bank A rootfs even if the system is configured to use the Bank B slot.

In this setup, ideally, the update generation tooling would package both kernel images as bup_payload images (a and b) and the update application logic would chose the appropriate image at install time.

Is there anyway achieve this with 32.7.3 L4T release?

My other options include modifying cboot to inject the proper rootfs path when execs the kernel or just straight up change the partition UUID in the GPT header at update install time.

Hi calvinmccoy,

If the slot B is active, then it would update slot A in OTA update.

Please refer to the following instruction for Image-Based OTA Update.
Over-the-Air Update - Preparing for an Image-Based OTA Update
If rootfs A/B is enabled on the target, Bootloader partitions and rootfs partitions on the inactive slot are updated in step 6. The target board then reboots to the slot just updated.

You could also refer to A/B System Update.

The update process you just described is exactly I’m seeing. The issue is the kernel image being installed is not always appropriate based on the system configuration.

The update lifecycle has two parts

  • artifact generation
  • artifact installation

When I run

USE_UBOOT=0 ROOTFS_AB=1 ./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh -s -o $ROOTFS_UPDATER -f image_fs.tar.gz jetson-tx2-devkit R32-6

I get an update package that includes (among other things) a “generic” rootfs image and kernel image in the bup_payload that is ALWAYS going point to rootfs A. The kernel image is assembled with root=PARTUUID=<ROOTFS-A_UUID> from l4t-rootfs-uuid.txt. I don’t see any way, based on the documentation, to change that.

As you said, the artifact installation script is aware of which bank is the offline bank and installs the rootfs image and kernel images into the correct partitions, but on reboot that kernel image will always use rootfs A even if the boot slot is bank B.

Ideally the artifact generation would package BOTH kernels and the install process installs the kernel image into the proper partition based on what is the target slot. This ensures I can create an update_payload.tar.gz that can update a device regardless of which slot is currently active.

Do you mean that rootfs_A would be used even if the current slot is B?
Could you help to provide the lsblkresult on your board?

Sorry, I don’t understand what you want to express here. Could you help to explain it more clear?

Here is lsblk for the board:

$ lsblk
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    16M  1 loop 
mmcblk0      179:0    0  29.1G  0 disk 
├─mmcblk0p1  179:1    0    14G  0 part 
├─mmcblk0p2  179:2    0    14G  0 part /
├─mmcblk0p3  179:3    0     4M  0 part 
├─mmcblk0p4  179:4    0     4M  0 part 
├─mmcblk0p5  179:5    0   512K  0 part 
├─mmcblk0p6  179:6    0   512K  0 part 
├─mmcblk0p7  179:7    0   512K  0 part 
├─mmcblk0p8  179:8    0   512K  0 part 
├─mmcblk0p9  179:9    0     3M  0 part 
├─mmcblk0p10 179:10   0     3M  0 part 
├─mmcblk0p11 179:11   0     2M  0 part 
├─mmcblk0p12 179:12   0     4M  0 part 
├─mmcblk0p13 179:13   0     4M  0 part 
├─mmcblk0p14 179:14   0   604K  0 part 
├─mmcblk0p15 179:15   0   604K  0 part 
├─mmcblk0p16 179:16   0     1M  0 part 
├─mmcblk0p17 179:17   0     1M  0 part 
├─mmcblk0p18 179:18   0     2M  0 part 
├─mmcblk0p19 179:19   0     2M  0 part 
├─mmcblk0p20 179:20   0     6M  0 part 
├─mmcblk0p21 179:21   0     6M  0 part 
├─mmcblk0p22 179:22   0     2M  0 part 
├─mmcblk0p23 179:23   0   128M  0 part 
├─mmcblk0p24 179:24   0   128M  0 part 
├─mmcblk0p25 179:25   0    63M  0 part 
├─mmcblk0p26 179:26   0   512K  0 part 
├─mmcblk0p27 179:27   0   256K  0 part 
├─mmcblk0p28 179:28   0   256K  0 part 
├─mmcblk0p29 179:29   0    80M  0 part 
├─mmcblk0p30 179:30   0    80M  0 part 
├─mmcblk0p31 179:31   0   512K  0 part 
├─mmcblk0p32 259:0    0   512K  0 part 
├─mmcblk0p33 259:1    0   300M  0 part 
└─mmcblk0p34 259:2    0 317.8M  0 part 
mmcblk0boot0 179:32   0     4M  1 disk 
mmcblk0boot1 179:64   0     4M  1 disk 
mmcblk0rpmb  179:96   0     4M  0 disk 
mmcblk2      179:128  0 238.3G  0 disk 
└─mmcblk2p1  179:129  0 238.3G  0 part /data
zram0        252:0    0 982.4M  0 disk [SWAP]
zram1        252:1    0 982.4M  0 disk [SWAP]
zram2        252:2    0 982.4M  0 disk [SWAP]
zram3        252:3    0 982.4M  0 disk [SWAP]

Let me back up and describe the incident that led to this. I had some arbitrary U-Boot based builds running on my system when I tried to switch to C-Boot only. The rootfs partitions had partition UUIDs that looked like:

$ sudo blkid
...
/dev/mmcblk0p1: UUID="3f285248-6467-42dc-9771-bb3963de52fa" TYPE="ext4" PARTLABEL="APP" PARTUUID="2a8137cb-0370-497a-8254-58a4f7e39954"
/dev/mmcblk0p2: UUID="70f681d8-2fdf-4a61-8b27-8284b1f26a63" TYPE="ext4" PARTLABEL="APP_b" PARTUUID="e118e918-6555-4df6-9ad3-a035fcf8e888"
...

On this system, slot 0 was the active boot chain.

I then made a fresh build (that should only be using C-boot) where the UUID in l4t-rootfs-uuid.txt was 6a3a84d8-1a5d-469e-ac15-ba6548d8387d. From this I generated an OTA update package. This package installed the kernel image and rootfs into the partitions corresponding to slot 1

On reboot C-Boot starts up, detects we are using slot 1, and detects a Linux kernel image in the KERNEL_B partition:

[0001.225] I> Welcome to Cboot
[0001.228] I> Cboot Version: t186-704e62f2
...
[0003.292] initializing target
[0003.295] calling apps_init()
[0003.298] starting app kernel_boot_app
[0003.321] I> found decompressor handler: lz4-legacy
[0003.325] I> decompressing BMP blob ...
[0003.337] I> Kernel type = Normal
[0003.340] I> ########## Fixed storage boot ##########
[0003.345] I> Loading kernel-bootctrl from partition
[0003.350] I> Loading partition kernel-bootctrl at 0xa8000000 from device(0x1)
[0003.364] W> tegrabl_get_kernel_bootctrl: magic number(0x00000000) is invalid
[0003.371] W> tegrabl_get_kernel_bootctrl: use default dummy boot control data
[0003.378] I> A/B: bin_type (24) slot 1
[0003.393] I> Boot image size read from image header: 20f3008
[0003.399] I> Boot image load address: 0x80400000
[0003.403] I> Loading kernel_b from partition
[0003.407] I> Loading partition kernel_b at 0x80400000 from device(0x1)
[0004.353] I> T18x: Authenticate kernel (bin_type 24), max size 0x4000000
[0004.434] I> Decrypt the buffer ... [0004.437] W> tegrabl_decrypt_block: fuse (0x0) is not burnt to do encryption (0x4); skip decryption.
[0004.446] I> done
[0004.448] I> Checking boot.img header magic ... [0004.452] I> [OK]
[0004.454] I> kernel-dtb is already loaded
[0004.458] I> Validate kernel-dtb ...
[0004.461] I> T18x: Authenticate kernel-dtb (bin_type 21), max size 0x100000
[0004.469] I> Decrypt the buffer ... [0004.472] W> tegrabl_decrypt_block: fuse (0x0) is not burnt to do encryption (0x4); skip decryption.
[0004.481] I> done
[0004.483] I> Kernel hdr @0x80400000
[0004.486] I> Kernel dtb @0x80000000
[0004.489] I> decompressor handler not found
[0004.493] I> Copying kernel image (34619400 bytes) from 0x80400800 to 0x82800000 ... [0004.514] I> Done
[0004.516] I> Move ramdisk (len: 0) from 0x82505000 to 0x96830000
[0004.523] I> Updated bpmp info to DTB
[0004.528] I> Ramdisk: Base: 0x96830000; Size: 0x0
[0004.533] I> Updated initrd info to DTB
[0004.536] W> WARN: Fail to override "console=none" in commandline
[0004.542] I> Active rootfs suffix: _b
[0004.567] I> disabled_core_mask: 0xffffff0c
[0004.571] I> Active slot suffix: _b
[0004.575] I> add_boot_slot_suffix: slot_suffix = _b
[0004.579] I> Linux Cmdline: console=ttyS0,115200 androidboot.presilicon=true firmware_class.path=/etc/firmware root=PARTUUID=6a3a84d8-1a5d-469e-ac15-ba6548d8387d rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 isolcpus=1-2  video=tegrafb earlycon=uart8250,mmio32,0x3100000 nvdumper_reserved=0x2772e0000 gpt rootfs.slot_suffix=_b usbcore.old_scheme_first=1 tegraid=18.1.2.0.0 maxcpus=6 no_console_suspend boot.slot_suffix=_b boot.ratchetvalues=0.2031647.1 vpr_resize bl_prof_dataptr=0x10000@0x275840000 sdhci_tegra.en_boot_part_access=1

You can see in kernel command-line that C-boot read from the kernel image that the root device is pointed to by the NEW UUID 6a3a84d8....

C-Boot then execs that kernel with the detected parameters and we start booting Linux:

[0005.106] I> Kernel EP: 0x82800000, DTB: 0x80000000
[    0.000000] Booting Linux on physical CPU 0x100
[    0.000000] Linux version 4.9.253-tegra (xxxx@xxxx) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05) ) #1 SMP PREEMPT Thu Mar 9 00:31:55 UTC 2023
...
[    0.000000] Kernel command line: console=ttyS0,115200 androidboot.presilicon=true firmware_class.path=/etc/firmware root=PARTUUID=6a3a84d8-1a5d-469e-ac15-ba6548d8387d rw rootwait rootfstype=ext4 console=ttyS0,115200n8 console=tty0 fbcon=map:0 net.ifnames=0 isolcpus=1-2  video=tegrafb earlycon=uart8250,mmio32,0x3100000 nvdumper_reserved=0x2772e0000 gpt rootfs.slot_suffix=_b usbcore.old_scheme_first=1 tegraid=18.1.2.0.0 maxcpus=6 no_console_suspend boot.slot_suffix=_b boot.ratchetvalues=0.2031647.1 vpr_resize bl_prof_dataptr=0x10000@0x275840000 sdhci_tegra.en_boot_part_access=1

Once the eMMC device is initialized the kernel begins to wait for the roots device with UUID 6a3a84d8...:

[    3.862003] mmc0: SDHCI controller on 3460000.sdhci [3460000.sdhci] using ADMA 64-bit with 64 bit addr
[    3.866263] mmc1: SDHCI controller on 3440000.sdhci [3440000.sdhci] using ADMA 64-bit with 64 bit addr
[    3.878101] mmc2: SDHCI controller on 3400000.sdhci [3400000.sdhci] using ADMA 64-bit with 64 bit addr
[    3.911650] mmc0: mmc_decode_ext_csd: CMDQ supported: depth: 31, cmdq_support: 1
[    3.927499] mmc0: periodic cache flush enabled
[    3.927505] mmc0: new HS400 Enhanced strobe MMC card at address 0001
[    3.927837] mmcblk0: mmc0:0001 DG4032 29.1 GiB 
[    3.927932] mmcblk0boot0: mmc0:0001 DG4032 partition 1 4.00 MiB
[    3.932153] mmcblk0boot1: mmc0:0001 DG4032 partition 2 4.00 MiB
[    3.932257] mmcblk0rpmb: mmc0:0001 DG4032 partition 3 4.00 MiB
[    3.936993]  mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 p22 p
23 p24 p25 p26 p27 p28 p29 p30 p31 p32 p33 p34
...
[    5.190714] Waiting for root device PARTUUID=6a3a84d8-1a5d-469e-ac15-ba6548d8387d...

But the kernel ends up waiting forever because that partition UUID doesn’t exist in the eMMC GPT table. The kernel is hung now and only several reboot will force the boot loader to revert back to slot 0

This is where my idea to lock the UUIDs comes from. If I were to pre-set the above UUIDS like this:

$printf "2a8137cb-0370-497a-8254-58a4f7e39954" > l4t-rootfs-uuid.txt
$printf "e118e918-6555-4df6-9ad3-a035fcf8e888" > l4t-rootfs-uuid.txt_b

I would at least have real UUIDs that correspond to those on my target already.
I would still need to contend with the artifact builder only using the slot A kernel image. From my example above, the kernel image supplied for the update has the slot 0 UUID baked into it despite it, eventually, needing to be written to the slot 1 KERNEL_B partition.

To answer your second questions of what I’m looking to achieve; The primary build process where I run

USE_UBOOT=0 ROOTFS_AB=1 ./flash.sh --no-flash jetson-tx2 mmcblk0p1

generates two kernel images

  • boot.img which has root=PARTUUID=2a8137cb.. parameter baked in
  • boot.img_b which has root=PARTUUID=e118e918.. parameter baked in

I then need to run l4t_generate_ota_package.sh to build my update package. What this SHOULD build is two BUP update payloads:

- bup_payload # built using boot.img
- bup_payload_b # built using boot_img_b

Then I copy the update package to my target, extract it, and run nv_ota_start.sh. That process SHOULD

  • detect the updater target slot vs the active slot
  • detect if we have multiple bup_payload images that correspond particular slots
  • copy the proper payload to the proper partition
    • copy bup_payload to the KERNEL_A partition if slot 0 is the target slot
    • copy bup_payload_b to KERNEL_B partition if slot 1 is the target slot

Does that clearly explain what I’m trying to achieve?

@KevinFFF any thoughts on my my setup/problem description?

We’ve pretty much settled on re-writing the target rootfs PARTUUID in the GPT header for each update we perform. This allows us to ensure the kernel and target rootfs remain in sync.