UEFI bootloader not updated during OTA update from 35.5.0 to 36.4.3

Hello, I’m doing full OTA update from 35.5.0 to 36.4.3 on Orin AGX with eMMC but for some reason UEFI bootloader is not updated (update capsule is present in the payload) for some reason. This leads to the state that device doesn’t boot after reboot. I have custom carrier board, but UEFI bootloader is built directly from upstream sources without adaptation.

This is how we perform the payload build and update.

Payload:

export BASE_BSP=<path_to_35.5.0>/Linux_for_Tegra
export TARGET_BSP=<path_to_36.4.3>/Linux_for_Tegra
cd $TARGET_BSP
sudo BASE_BSP=$BASE_BSP TARGET_BSP=$TARGET_BSP ./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh orin-agx R35-5

OTA:

apt update && apt install -y efibootmgr nvme-cli
cd /home/orin || exit 1
tar -xf ota_tools_R36.4.3_aarch64.tbz2
mkdir -p /ota
mv ota_payload_package.tar.gz /ota
(cd Linux_for_Tegra/tools/ota_tools/version_upgrade && ./nv_ota_start.sh /ota/ota_payload_package.tar.gz && reboot)

After OTA update from 35.5.0 UEFI version stays on 202210.3. Device doesn’t boot after the reboot with error:

....L4TLauncher: Unable to locate L4T Support protocol: Not Found
L4TLauncher: Using legacy interface. Support would be deprecated soon!!!
L4TLauncher: Failed to get PlatformResourceInfo

UEFI fails to boot new image and ends up in UEFI shell. If I do clean flash to 36.4.3, the correct UEFI version is 202412.0. EFI partitions and rootfs seems to be updated fine.

Could you please point me, where could be the issue, what we should check? What could be the reason that kernel and rootfs is updated but UEFI not?

Thank you.

When I try to update only UEFI capsule, it doesn’t do anything, and there is this status code:

$ sudo cat /sys/firmware/efi/esrt/entries/entry0/last_attempt_status
6162

Hi xvh,

Could you help to clarify if the issue is specific to your custom carrier board?
(i.e. do you have the devkit to reproduce the similar issue?)

Please share the full log when you generate the OTA payload on the host.
And also, the log when you run reboot to trigger the update on the device.

Hello, I am working with xvh on the same issue so I can add some details.

We have not tried this on the devkit so far. It seems the issue would be the same.

We also discovered that the update is actually failing earlier in the process—we just hadn’t noticed it before.

What we’re trying to achieve is:

  • update from 35.4.1 → 35.5.0

  • then from 35.5.0 → 36.4.3

Both updates include a UEFI capsule. On successful updates, we see the UEFI version change:

Before (35.4.1):

$ cat /sys/class/dmi/id/bios_version

202210.3-52cefd4-dirty

After updating to 35.5.0:

$ cat /sys/class/dmi/id/bios_version

202210.4-32eee0ec-dirty

On the failing devices, this version bump never happens. We didn’t notice earlier because 35.5.0 still boots even with the older UEFI.

It seems that the UEFI update works if the OTA payload is created directly from the L4T build and machine used to flash the device. However, that is not our use case. Whenever we try to create OTA payload from another device (clean L4T build without flashing) The UEFI Capsule gets rejected on the AGX.

We do not have secure boot currently activated, so the capsule should be signed only with the test keys. That should not be an issue.

I will try to capture some logs tomorrow from creation of the OTA payload and also both the successful and unsuccessful update.

Could you elaborate on this use case?
As my understanding, we would expect the OTA payload is created from BSP package(on the host PC) of the Target version.
Is “another device” meaning the another PC?

Do you connect the Jetson device(in force recovery state) when you were generating the OTA payload on the host?

So we did more investigation and I’ve found some leads. I was successfull to reproduce the issue with UEFI debug build to have more logs. There seems to be a problem with missing CRC in VER partition, which prevents Fmp lib from correct inicialization:

FwImageLibProtocolCallback: Got FW Image protocol, Name=mb1
FwImageLibProtocolCallback: Got FW Image protocol, Name=psc_bl1
FwImageLibProtocolCallback: Got FW Image protocol, Name=MB1_BCT
FwImageLibProtocolCallback: Got FW Image protocol, Name=MEM_BCT
FwImageLibProtocolCallback: Got FW Image protocol, Name=tsec-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=nvdec
FwImageLibProtocolCallback: Got FW Image protocol, Name=mb2
FwImageLibProtocolCallback: Got FW Image protocol, Name=xusb-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=bpmp-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=bpmp-fw-dtb
FwImageLibProtocolCallback: Got FW Image protocol, Name=psc-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=mts-mce
FwImageLibProtocolCallback: Got FW Image protocol, Name=sc7
FwImageLibProtocolCallback: Got FW Image protocol, Name=pscrf
FwImageLibProtocolCallback: Got FW Image protocol, Name=mb2rf
FwImageLibProtocolCallback: Got FW Image protocol, Name=cpu-bootloader
FwImageLibProtocolCallback: Got FW Image protocol, Name=secure-os
FwImageLibProtocolCallback: Got FW Image protocol, Name=smm-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=eks
FwImageLibProtocolCallback: Got FW Image protocol, Name=dce-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=spe-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=rce-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=adsp-fw
FwImageLibProtocolCallback: Got FW Image protocol, Name=BCT-boot-chain_backup
FwImageLibProtocolCallback: Got FW Image protocol, Name=VER
FwImageLibProtocolCallback: No handles: Not Found
GetFuseSettings: fuse=1, offset=18
GetTnSpec: TegraPlatformCompatSpec=3701-500-0004--1--cti-orin-agx-agx201-00-
MmSendCommBuffer: doing communicate
MmSendCommBuffer: communicate returned: Success
VerPartitionGetVersion: Crc mismatch expected=0x0, received=0x2B6575C7
GetVersionInfo: Failed to parse version info: Volume Corrupt
GetVersionInfo: Version=0x0, Str=(<null string>), Status=Volume Corrupt
FmpDeviceFwImageCallback: GetVersionInfo failed, FMP library will not be initialized: Unsupported
FwImageInstallFmp: no installer
MmSendCommBuffer: doing communicate
MmSendCommBuffer: communicate returned: Success
VerPartitionGetVersion: Crc mismatch expected=0x0, received=0x2B6575C7
GetVersionInfo: Failed to parse version info: Volume Corrupt
GetVersionInfo: Version=0x0, Str=(<null string>), Status=Volume Corrupt
FmpDeviceFwImageCallback: GetVersionInfo failed, FMP library will not be initialized: Unsupported
FwImageInstallFmp: installing FMP
FmpDxe(NVIDIA System Firmware): Variable 7C374309-1649-4682-8BEE-04F3A8399414 FmpVersion
FmpDxe(NVIDIA System Firmware): Variable 7C374309-1649-4682-8BEE-04F3A8399414 FmpLsv
FmpDxe(NVIDIA System Firmware): Variable 7C374309-1649-4682-8BEE-04F3A8399414 LastAttemptStatus
FmpDxe(NVIDIA System Firmware): Variable 7C374309-1649-4682-8BEE-04F3A8399414 LastAttemptVersion
FmpDxe(NVIDIA System Firmware): Variable 7C374309-1649-4682-8BEE-04F3A8399414 FmpState
FmpDxe(NVIDIA System Firmware): GetVersionString() unsupported in FmpDeviceLib.
FmpTegraGetLowestSupportedVersion: Read fmp-lowest-supported-version=2294785: Success
InstallProtocolInterface: 86C77A67-0B97-4633-A187-49104D0685C7 8061BC428
InstallProtocolInterface: 1849BDA2-6952-4E86-A1DB-559A3C479DF1 806104E68
FmpDxe(NVIDIA System Firmware): FmpDeviceLib registration returned EFI_SUCCESS.  Expect FMP to be installed during the BDS/Device connection phase.

Then later there are other logs saying that Fmp library is not ready and capsule installation is failed.

FSOpen: Open '\EFI\UpdateCapsule\' Success
FSOpen: Open 'TEGRA_BL.Cap' Success
Successfully read capsule file TEGRA_BL.Cap from disk.
GetFileImageInAlphabetFromDir status 0
FSOpen: Open 'TEGRA_BL.Cap' Success
RemoveFileFromDir status 0
BuildGatherList: creating capsule descriptors at 0x31B0A98
ProcessCapsuleImage for FmpCapsule ...
ValidateFmpCapsule ...
ValidateFmpCapsule - Success
ProcessFmpCapsuleImage ...
FmpCapsule: route payload to right FMP instance ...
FMP (0) ImageInfo:
Fmp->SetImage ...
ImageTypeId - BF0D4599-20D4-414E-B2C5-3595B1CDA402, PayloadIndex - 0x0, ImageIndex - 0x1 (UpdateHardwareInstance - 0x0)(ImageCapsuleSupport - 0x1)
FmpDxe(NVIDIA System Firmware): Set variable 7C374309-1649-4682-8BEE-04F3A8399414 FmpState LastAttemptVersion 00000000
FmpDxe(NVIDIA System Firmware): Certificate #2 [8060DCA79..8060DCE6D].
AuthenticateFmpImage - CertType: 4AAFD29D-68DF-49EE-8AA9-347D375665A7
FmpAuthenticatedHandlerPkcs7 - Image: 0x004A3078 - 0x009AE492
FmpAuthenticatedHandlerPkcs7: PASS verification
FmpTegraCheckImage: Capsule update triggered. Image=0x8004A3B7D ImageSize=10148237
FmpTegraCheckImage: FMP library not initialized
FmpDxe(NVIDIA System Firmware): CheckTheImage() - FmpDeviceLib CheckImage failed. Status = Not Ready
FmpDxe(NVIDIA System Firmware): SetTheImage() - Check The Image failed with Not Ready.
FmpDxe(NVIDIA System Firmware): SetTheImage() LastAttemptStatus: 6162.
FmpDxe(NVIDIA System Firmware): Set variable 7C374309-1649-4682-8BEE-04F3A8399414 FmpState LastAttemptStatus 0000181 ÿâ

I’ve was able to readout the VER partition by flash.sh script and it seems, that therer is really no CRC. There is a comparison with VER partition from device which was successfully updated. I attach the faulty partition. Also attaching the “corrupted” partition.

ver_a_read.bin.zip (391 Bytes)

=== GOOD device (OTA works) ===
Line 1 (  4 bytes): ‘NV4\n’
Line 2 ( 22 bytes): ‘# R36 , REVISION: 4.3\n’
Line 3 ( 35 bytes): ‘BOARDID=3701 BOARDSKU=0004 FAB=500\n’
Line 4 ( 15 bytes): ‘20260226110211\n’
Line 5 (  9 bytes): ‘0x240403\n’
Line 6 ( 24 bytes): ‘BYTES:85 CRC32:BFB238E3\n’
BYTES:    85
Stored:   0xBFB238E3
Computed: 0xBFB238E3
Result: PASS — UEFI will accept capsule

=== BAD device (OTA broken) ===
Line 1 (  4 bytes): ‘NV4\n’
Line 2 ( 22 bytes): ‘# R35 , REVISION: 4.1\n’
Line 3 ( 35 bytes): ‘BOARDID=3701 BOARDSKU=0004 FAB=500\n’
Line 4 ( 15 bytes): ‘20260320082928\n’
Line 5 (  9 bytes): ‘0x230401\n’
Line 6 ( 16 bytes): ‘BYTES:85 CRC32:\n’
CRC32: EMPTY → 0x00000000
Computed:        0x50E3311D
Result: FAIL — UEFI will reject capsule

Two questions:

  • How is it possible that CRC is missing on some of our devices? What could have cause that?
  • Can we somehow fix this remotely (e.g. over SSH) in production so OTA can be performed?
    • I considered patching UEFI to ignore VER partition CRC error, is it good idea?

Thank you.

Please refer to OTA not updating pinmux - #11 by KevinFFF and check if zlib1g is installed on your host before you generate the capsule payload.

Can you please more explain, how is it related? Because I wrote that our issue is a missing CRC in VER partition. Capsule update works fine, it is just blocked by checking VER partition.

Can you share the result of $ sudo nvbootctrl dump-slots-info on your board?
We have the known issue that the CRC may be unexpected if you don’t have zlib1g installed on your host before flash.

Before OTA is seems just fine:

orin@default:~$ sudo nvbootctrl dump-slots-info
Current version: 35.4.1
Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal

After capsule update you can just see capsule update status 6162 which matches LAS_ERROR_FMP_LIB_UNINITIALIZED.

The most important question is how we can fix VER partition in field remotely just over network (SSH, OTA)? We cannot use UEFI capsule update because current bootloader will always fail. There are already devices in field and physical reflashing is expensive.

Thank you for your help.

We are facing the same issue hear, a remote fix via OTA or SSH is critical for our fleet.
Looking forward to a technical workaround from the NVIDIA team, Thanks.

I think I was able to do a workaround by building a custom efi binary which writes correct VER partition content and is triggered by UEFI shell startup script. Will share more details once we confirm it in production.

Could you share the result of qspi_bootblob_ver.txt in the BSP package of the host which you used flash the board?

We are not able to reproduce the flashing issue and version txt seems to be fine. But in previous posts I attached the exported VER partition from broken production device which proves that VER partition CRC is wrong.

ver_a_read.bin.zip

qspi_bootblob_ver.txt:

NV4
# R35 , REVISION: 4.1
BOARDID=3701 BOARDSKU=0004 FAB=500
20260324121920
0x230401
BYTES:85 CRC32:2C2E3ECB

So here is a workaround - custom EFI binary which checks the consistency of VER partition and fixes the CRC (or overrides it completely). This was only the way I figured out because QSPI flash is not accessible from user space or recovery kernel and UEFI capsule update is not possible. It can be accessed only from bootloaders and over USB flashing. Be aware that it was built and tested for L4T 35.4.1, it may need some adaptions for other versions.

Just to finish this case - the root cause of missing CRC was missing python-is-python3 package as described in this ticket: OTA update from 32.7.1 to 35.3.1 - #6 by luis21 .

Nvidia’s proprietary implementation of embedded best practices never stops surprising me. I hope that this solution will help someone to avoid many days of effort which we spent with that issue.