Bootloader capsule update from L4T 35.5.0 to 36.5.0 not applied with last_attempt_status 6163

We need to update our devices with ROOTFS_AB=1 (Orin Nano 8GB with custom base board) from 35.5.0 to 36.5.0 in order to get the rce-fw fix for the 49day bug.

The procedure for this is:

  1. update rootfs in other partition to our image with 36.5.0
  2. copy bl capsule to esp partition
  3. set OsIndications uefi var
  4. copy new efi launcher (BOOTAA64.efi) to esp partition
  5. reboot

Now this works perfectly fine on devices that come from our factory (i.e. fully flashed with 35.5.0), starting from slot A.
I can see in the serial console that the qspi is flashed with the bootloader in the capsule, then switches to slot B and then picks up the new efi launcher which boots into our updated rootfs and all runs as expected.
From the updated slot B I can also successfully update slot A to 36.5.0 using exactly the same capsule and procedure.

Problem is that if I use another fresh device and only switch boot slots via nvbootctrl set-active-boot-slot 1, wait until it’s fully booted and ready (and hence nvbootctrl verify also ran already), the capsule is not applied and then the boot of course get’s stuck because the 35.5.0 bootloader can’t use the new L4T efi launcher.
When I then manually copy the old efi launcher back, it boots into slot B/1 again and esrt contains:

==> /sys/firmware/efi/esrt/entries/entry0/capsule_flags <==
0x0

==> /sys/firmware/efi/esrt/entries/entry0/fw_class <==
bf0d4599-20d4-414e-b2c5-3595b1cda402

==> /sys/firmware/efi/esrt/entries/entry0/fw_type <==
1

==> /sys/firmware/efi/esrt/entries/entry0/fw_version <==
2295040

==> /sys/firmware/efi/esrt/entries/entry0/last_attempt_status <==
6163

==> /sys/firmware/efi/esrt/entries/entry0/last_attempt_version <==
0

==> /sys/firmware/efi/esrt/entries/entry0/lowest_supported_fw_version <==
2295040

The same also happens if I switch back to A again before applying the capsule (6163, capsule not applied, and stil in slot A with incompatible efi launcher).

What is going on here? How can I make this work for devices we already have in the field, where we already updated the rootfs serveral times and hence switched boot slots multiple times?

In all cases the TNSPEC in /etc/nv_boot_control.conf is the same:

TNSPEC 3767-300-0003-S.1-1-1-rc_visard_ng-
COMPATIBLE_SPEC 3767--0003--1--rc_visard_ng-
TEGRA_LEGACY_UPDATE false
TEGRA_BOOT_STORAGE nvme0n1
TEGRA_EMMC_ONLY false
TEGRA_CHIPID 0x23
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

and matches what is in the efi var:

sudo cat /sys/firmware/efi/efivars/TegraPlatformSpec-781e084c-a330-417c-b678-38e696380cb9 
3767-300-0003-S.1-1-1-rc_visard_ng-
sudo cat /sys/firmware/efi/efivars/TegraPlatformCompatSpec-781e084c-a330-417c-b678-38e696380cb9 
3767--0003--1--rc_visard_ng-

The bup in the capsule contains a bootloader with nearly the same tnspec (just 0 instead of 1 in the field before the board).
Apparently that has something to do with the slot? But none of the bups have a 1 there…

BLOB PATH:
Linux_for_Tegra/bootloader/payloads_t23x/bl_only_payload

BLOB HEADER:
       Magic: NVIDIA__BLOB__V3
     Version: v3.1-2022.6-0 (0x01030622)
   Blob Size: 10,981,541 bytes
 Header Size: 40 bytes
 Entry Count: 30 partition(s)
        Type: 0 (0 for update, 1 for BMP)
Uncompressed
   Blob Size: 10,981,541 bytes
   Accessory: Not Present

ENTRY TABLE:
|       part_name       |  offset  | part_size | version | op_mode |              tnspec              | 
|                   BCT |     5560 |      8192 |   3650  |    2    | 3767-300-0003--1-0-rc_visard_ng- | 
|                 BCT_A |    13752 |      8192 |   3650  |    2    | 3767-300-0003--1-0-rc_visard_ng- | 
|                 BCT_B |    21944 |      8192 |   3650  |    2    | 3767-300-0003--1-0-rc_visard_ng- | 
|                   mb1 |    30136 |    283120 |   3650  |    2    | 3767-300-0003--1-0-rc_visard_ng- | 
|               psc_bl1 |   313256 |    139264 |   3650  |    2    |                                  | 
|               MB1_BCT |   452520 |     17664 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 
|               MEM_BCT |   470184 |    243712 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 
|               tsec-fw |   713896 |    192512 |   3650  |    0    |                                  | 
|                 nvdec |   906408 |    294912 |   3650  |    2    |                                  | 
|                   mb2 |  1201320 |    440880 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 
|               xusb-fw |  1642200 |    164864 |   3650  |    2    |                                  | 
|               bpmp-fw |  1807064 |   1027072 |   3650  |    2    | 3767-300-0003--1-0-rc_visard_ng- | 
|           bpmp-fw-dtb |  2834136 |    294528 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 
|                psc-fw |  3128664 |    310768 |   3650  |    2    |                                  | 
|               mts-mce |  3439432 |    187120 |   3650  |    2    |                                  | 
|                   sc7 |  3626552 |    187168 |   3650  |    2    |                                  | 
|                 pscrf |  3813720 |    139264 |   3650  |    2    |                                  | 
|                 mb2rf |  3952984 |    122688 |   3650  |    0    |                                  | 
|        cpu-bootloader |  4075672 |   3184864 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 
|             secure-os |  7260536 |   1932448 |   3650  |    0    |                                  | 
|                   eks |  9192984 |      9232 |   3650  |    0    |                                  | 
|                dce-fw |  9202216 |    792368 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 
|                spe-fw |  9994584 |    270336 |   3650  |    0    |                                  | 
|                rce-fw | 10264920 |    458096 |   3650  |    0    |                                  | 
|               adsp-fw | 10723016 |    124832 |   3650  |    0    |                                  | 
|                pva-fw | 10847848 |     67024 |   3650  |    0    |                                  | 
| BCT-boot-chain_backup | 10914872 |     32768 |   3650  |    2    | 3767-300-0003--1-0-rc_visard_ng- | 
|  secondary_gpt_backup | 10947640 |     16896 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 
|                   VER | 10964536 |       109 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 
|         secondary_gpt | 10964645 |     16896 |   3650  |    0    | 3767-300-0003--1-0-rc_visard_ng- | 

So to me it seems like the bup/capsule and procedure is fine, just for some reason silently rejected (no error in the serial console) if we already updated the rootfs before (or even only switched slots).
Since the last_attempt_status is set to 6163 and the cap is also deleted from the esp partition, OsIndications clearly was set correctly and EFI found the capule and deleted it.

Also the slots are all normal, and retry_count is at the configured max:

sudo nvbootctrl dump-slots-info
Current version: 35.5.0
Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal
sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: A
Active rootfs slot: A
num_slots: 2
slot: 0,             retry_count: 1,             status: normal
slot: 1,             retry_count: 1,             status: normal

I haven’t been able to find any difference between the cases where it works and where it fails, except that slots were switched at least once in the cases where it fails.

We’re out of ideas and need help here.

*** Please note that this reply is generated by LLM automatically ***

The issue you’re experiencing seems to be related to the way the UEFI firmware handles the Capsule update process, particularly when the boot slots have been switched multiple times.

From your description, it appears that the Capsule update is not being applied when the device has already updated the rootfs several times and switched boot slots multiple times. This is despite the fact that the OsIndications UEFI variable is set correctly, and the Capsule is deleted from the ESP partition.

One possible explanation for this behavior is that the UEFI firmware is checking the TNSPEC field in the bootloader image contained in the Capsule, and rejecting the update if it doesn’t match the expected value. The TNSPEC field is used to identify the target device and board configuration, and it’s possible that the value in the Capsule is not matching the expected value due to the slot switching.

The fact that the lastattemptstatus is set to 6163 and the Capsule is deleted from the ESP partition suggests that the UEFI firmware is indeed attempting to apply the Capsule update, but is rejecting it for some reason.

To resolve this issue, you may need to modify the Capsule update process to take into account the slot switching. Here are a few possible solutions you could try:

  1. Verify the TNSPEC field: Double-check the TNSPEC field in the bootloader image contained in the Capsule, and ensure that it matches the expected value for the target device and board configuration.
  2. Use a different Capsule update method: Instead of using the OsIndications UEFI variable to trigger the Capsule update, try using the FmpCapsuleSinglePartitionChain variable to specify the boot chain to be updated. This may allow you to bypass the slot switching issue.
  3. Update the bootloader image: Try updating the bootloader image in the Capsule to match the expected TNSPEC value for the target device and board configuration.
  4. Reset the boot slots: Before applying the Capsule update, try resetting the boot slots to their default values using the nvbootctrl command. This may help to ensure that the UEFI firmware is in a consistent state before applying the update.

In terms of comparing the cases where the Capsule update works and where it fails, it’s possible that the key difference is the number of times the boot slots have been switched. If the device has already updated the rootfs several times and switched boot slots multiple times, it’s possible that the UEFI firmware is becoming confused about the current boot slot and is rejecting the Capsule update as a result.

To troubleshoot this issue further, you may want to try using the nvbootctrl command to dump the current boot slot information and verify that it matches the expected value. You can also try using the efibootmgr command to verify the boot order and ensure that the UEFI firmware is configured correctly.

Overall, it’s clear that the issue is related to the way the UEFI firmware handles the Capsule update process, particularly when the boot slots have been switched multiple times. By trying the solutions outlined above and verifying the boot slot information, you should be able to resolve the issue and successfully apply the Capsule update to your devices.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

The LLM answer is no good here…

hello flixr,

may I have your steps to generate OTA payload and run OTA update.
it should based-on the same partition layout for creating OTA payload image. you must given ROOTFS_AB=1 to the command-line.
besides.. you’ll need to update l4t_generate_ota_package.sh and ota_board_specs.conf to create OTA payloads for customize board.

All the flash and bup builds run with the same env vars.
I built the bup with

sudo ROOTFS_AB=1 ROOTFS_RETRY_COUNT_MAX=2 BOARDID=3767 FAB=300 BOARDSKU=0003 CHIP_SKU="00:00:00:D5" SKIP_EEPROM_CHECK=1 ./build_l4t_bup.sh --clean-up rc_visard_ng internal
sudo ROOTFS_AB=1 ROOTFS_RETRY_COUNT_MAX=2 BOARDID=3767 FAB=300 BOARDSKU=0003 CHIP_SKU="00:00:00:D5" SKIP_EEPROM_CHECK=1 ./build_l4t_bup.sh rc_visard_ng internal

and the capsule with

 sudo ./generate_capsule/l4t_generate_soc_capsule.sh -i bootloader/payloads_t23x/bl_only_payload -o rc_visard_ng-bl-$L4T_VERSION.cap t234

And I posted the rest of the steps already above.

hello flixr,

let me have confirmation,
are you going to have only bootloader update (without rootfs) from r35.5.0 to r36.5.0?

No, I’m updating all: rootfs, bootloader and L4T launcher (BOOTAA64.efi).
But I’m not using the Nvidia OTA scripts, as we have our own mechanism.
See the steps outlined in the first post…

hello flixr,

please have Over-the-Air Update for moving forward, we’ve tested locally to upgrade from r35.5.0 to r36.5.0.

Did you also test this on a system that had the rootfs slots switched before doing the update?

We can’t use the nvidia scripts directly as there is not enough space to download the whole rootfs image to a scratch space (UDA partition). But that is not the problem anyway.
I studied the OTA scripts and as far as I can tell, they do exactly the same as I wrote above, right?
(there is no partition layout change).

Any idea why it works if the board was newly flashed, but not anymore once the roofs slots was changed (and verified) at least once?
This seems to be a bug in the older uefi 35.5.0 version and the tnspec/compatible matching was relaxed in later versions. I don’t have the same problem when I run a capsule update from bootloader/uefi which is already on 36.5.0.

How can we continue here? Any way to further debug this and find working solution?

let me have some clarification,
– Image-based OTA update including updating rootfs and updating bootloader. Updating rootfs is before updating the bootloader.
– Updating rootfs might be executed in the recovery kernel (rootfs A/B is disabled) or the normal kernel (rootfs A/B is enabled).
– Updating the bootloader is executed in UEFI. Once updating rootfs is finished, the device reboots, and then UEFI updates the bootloader through UEFI capsule update.
– Once the UEFI capsule update is finished, the device reboots to the updated chain; otherwise, the device reboots to the original chain.

Yes, as I wrote in the first post:

  1. The rootfs in the other slot is updated first (directly written to that partition)
  2. then the esp partition is mounted and the capsule is placed in the UpdateCapsule dir
  3. OsIndications are set
  4. the EFI launcher in the esp partition is replaced with the newer version from 36.5
  5. dump-slots-info still shows everything normal and current and active bootloader slot: A, last_attempt_status is 0, all as expected
  6. reboot
  7. the “old” uefi from 35.5.0 sees the capsule, rejects it with last_attempt_status 6163, deletes it and hence does NOT switch the bootslots.
  8. … then since the capsule/BUP was NOT applied, the slot A also doesn’t boot anymore (EFI launcher from 36.5.0 can be used with bootloader/uefi 35.5.0). This is annoying, but expected as a follow up error.

The problem is step 7. Why is the capsule/BUP rejected?
And why is it only rejected if bootslots were switched before, but works if I the whole system was freshly flashed with 35.5.0.

I also tried to apply the capsule from the uefi console via CapsuleApp.efi rc_visard_ng-bl-36.5.0.cap and that worked just fine. So I suppose that circumvents some of the checks that are done if uefi is triggered to do that automatically via OsIndications.

hello flixr,

please double check Manually Trigger the Capsule Update, may I also know the bootloader logs of the Capsule update status.

I already triple checked the capsule update trigger:

  • OsIndications is set correctly
  • uefi obviously “sees” and check/parses the capsule, as the /sys/firmware/efi/esrt/entries/entry0/last_attempt_status is now not 0 anymore, but reports 6163
  • the capsule is deleted → also shows that it was “seen” by uefi

I didn’t see anything about the capsule in the bootloader/uefi logs when it fails (uefi release build apparently does not log any warning/error there). Only in the case when it works, I can see the “Update Progress” while the BUP is flashed to qspi.
The output of nvbootctrl dump-slots-info I already posted in the first post. How can I get more details/logs?

hello flixr,

please setup serial console to gather the bootloader logs. you’ll need a serial console cable connecting to TXD/RXD/GND of debug UART.
see-also, Jetson Nano & NX Style - Serial Debug Console - JetsonHacks

As already mentioned there was nothing in the bootloader logs from UEFI in the failure case. Only in the successful case it shows the “Update Progress” bar.

But after more testing I found out that it only fails if the current slot was not rebooted at least once.
After diffing all esrt entries and efivars I found that there is the efivar BootChainFwNext if the system booted into the current slot for the first time (even if nvbootctrl verify was already called and all slots report as normal).
If this variable exists (on 35.5.0), the capsule update will fail with 6163. If I either reboot or simply delete the file before copying the capsule and setting OsIndications and then reboot to let UEFI apply the capsule it works.

Once it booted into the newer 36.5.0 bootloader slot, I can immediately update the other slot again via the same capsule just fine without deleting the BootChainFwNext var. So this restriction does not seem to apply to L4T 36.x anymore.

This is not documented anywhere (at least I could not find it) and now cost me many many days to figure out!
Please document such essential things and just print a warning in UEFI if it rejects a capsule with the reason instead of just silently failing and leaving behind an essentially bricked device!