Encountering RAS Uncorrectable Error in IOB and ACI

Hi…I’ve been getting this error on secure boot for the Orin NX Jetson board. The error was somewhat intentional since I actually packaged a corrupted image with a valid signature for the corrupted image with as an experiment to see what would happen during the secure boot. I was hoping that the boot WDT would save the day and switch to the other chain after 3 attempts to boot. That didn’t happen and I’m not sure it could have happened since the error message said the core was powered down. Here is the error message:

ERROR: Exception reason=1 syndrome=0x82000010

ERROR: **************************************

ERROR: RAS Uncorrectable Error in IOB, base=0xe010000:

ERROR: Status = 0xe4000612

ERROR: SERR = Error response from slave: 0x12

ERROR: IERR = CBB Interface Error: 0x6

ERROR: MISC0 = 0xc4524040

ERROR: MISC1 = 0x2f4c870000000000

ERROR: MISC2 = 0x0

ERROR: MISC3 = 0x0

ERROR: ADDR = 0x8000000000000200

ERROR: **************************************

ERROR: sdei_dispatch_event returned -1

ERROR: **************************************

ERROR: RAS Uncorrectable Error in ACI, base=0xe01a000:

ERROR: Status = 0xe8000904

ERROR: SERR = Assertion failure: 0x4

ERROR: IERR = FillWrite Error: 0x9

ERROR: Overflow (there may be more errors) - Uncorrectable

ERROR: ADDR = 0x8000000000000200

ERROR: **************************************

ERROR: sdei_dispatch_event returned -1

ERROR: Powering off core

Any ideas as to how an error of this type can be recovered from?

Thx!

Alin

*** Please note that this reply is generated by LLM automatically ***

The error message you’re seeing indicates that the Jetson Orin NX board has encountered an uncorrectable error during the secure boot process. This error is likely due to the corrupted image you intentionally packaged with a valid signature.

The error message suggests that the error occurred in the IOB (Input/Output Block) and ACI (Accelerated Chip Interface) components of the SoC. The specific error codes and addresses provided in the error message can help identify the root cause of the issue.

In this case, the error is not recoverable, and the core is powered down to prevent further damage. The Boot Watchdog Timer (BWT) is not triggered in this scenario because the error occurs before the boot process can even start.

To recover from this type of error, you’ll need to reflash the board with a valid and non-corrupted image. You can use the flash.sh command with the -v option to specify the SBK key file, which is used to encrypt the Bootloader components.

Here are the general steps to recover from this error:

  1. Prepare a valid and non-corrupted image for the Jetson Orin NX board.
  2. Use the flash.sh command with the -v option to specify the SBK key file.
  3. Reflash the board with the new image.

Example command:

$ sudo ./flash.sh -v <sbk_key_file> <board> mmcblk0p1

Replace <sbk_key_file> with the path to your SBK key file, and <board> with the configuration of the target device (e.g., jetson-orin-nano-devkit).

Note that if you’re using a Fuse Configuration XML file, you’ll need to update the SecureBootKey field with the correct SBK key value.

In comparison to other Jetson boards, the Orin NX has a similar secure boot process, but the specific error messages and recovery procedures may vary. For example, the Jetson AGX Xavier board has a different secure boot architecture, and the error messages may be different.

In summary, to recover from this error, you’ll need to reflash the board with a valid and non-corrupted image, using the flash.sh command with the -v option to specify the SBK key file.

*** Please give the thumbs up if you get answers you like. Or provide feedback to help us improve the answer. ***

this is interesting…it sounds like you are saying that the boot WDT is not active at all in the earlier stages of boot (ie. mb1, mb2, etc). If this is the case, exactly when does the boot WDT get armed? Also, just for clarity, I’m refering to the Nvidia boot WDT (appears to be 5 minute window), not the Linux WDT (appears to be a 2 minute window).

hello alin.pilkington,

may I know what’s your authorization scheme.
for instance, is it fused with PKC or PKCSBK? what else keys you’ve fused onto this module to enable secureboot?

please also share the steps in details for cross-check.

unfortunately, I cannot identify the issue according to that.
please setup a serial console to gather the complete UART logs for reference.

Hi Jerry…we are using only PKC (no SBK). I’ve attached the console boot log as requested.

Thx!

Alin

FI-123-PACKAGED-A-UEFI-OFFSET-0x1420-WDT-ON-20260427.log (33.2 KB)

hello alin.pilkington,

may I also confirm your steps, it looks MB1/MB2 has complete, it’s crashed (RAS Uncorrectable Error) in UEFI.
FYI, Secure boot establishes a root of trust, it should report an error in the very beginning image flash stage, (i.e. BootRom) if you given wrong key to the flash commands.

may I also confirm which partition you’ve corrupted intentionally?
see-also Bootloader Implementation for the flow chart.

Hi Jerry…thx for taking a look at this. I corrupted the UEFI image before signing it and flashed it into QSPI slot A. The correct keys were given so I don’t think that is the issue.

it’s A_cpu-bootloader partition, right? please try with Generating a Specified Partition BUP.

Yes…A_cpu-bootloader. I’ll try it again with a specified partition BUP and get back to you. By the way, a 2nd question I posted was wondering specifically when the Boot Watchdog Timer is armed and running. It seems like it isn’t armed until much later in the boot process than I originally thought.

hello alin.pilkington,

yes, it should be WDT timeout to trigger software reboot for next trail.
according to your logs, it’s the 1st booting process, did you keep the uncorrectable reported till system reset?

Hi Jerry…the system never rebooted which was worrisome for us. Once the hard crash occurred and the core powered off, there was no activity…no reset happened.

Thx!

Alin

hello alin.pilkington,

please try capsule update approach for updating faulty binary to test bootloader update redundancy.

Hi Jerry (NOTE: AI drafted the msg below based on what I’ve done),

Yes, this is the A_cpu-bootloader partition.

I tried the r36.5 Developer Guide flow for Generating a Specified Partition BUP using A_cpu-bootloader.

Command used:

sudo FAB=303 BOARDID=3767 BOARDSKU=0000 BOARDREV= FUSELEVEL=1 CHIPREV= CHIP_SKU="00:00:00:D3" \
  ./build_l4t_bup.sh \
    --single-image-bup A_cpu-bootloader \
    -u secureboot_pkc/pkc_rsa3k.pem \
    jetson-orin-nano-devkit \
    mmcblk0p1

Before running this, I verified that the raw UEFI input being packaged was stock r36.5:

/home/alin.pilkington/nvidia/nvidia_sdk/Linux_for_Tegra/bootloader/uefi_jetson.bin
sha256:
c0bd354a94c28d6d7161cba61a2bebb7df379b9d9ea63efe843e60af0abbba77

This matches the uefi_jetson.bin extracted directly from:

Jetson_Linux_R36.5.0_aarch64.tbz2

The specified-partition BUP generation succeeded:

SUCCESS: A_cpu-bootloader_only_payload created

Generated payload:

bootloader/payloads_t23x/A_cpu-bootloader_only_payload
size:   3181056 bytes
sha256: 0ba7d99a016ba93f30522ff1f91562d1097ee7e682e72e28b3ab9d584ae38ce1

The generated signed A_cpu-bootloader artifact inside the BUP generation flow was:

A_cpu-bootloader_stock_generated_uefi_jetson_with_dtb_aligned_blob_w_bin_sigheader.bin.signed
size:   3180832 bytes
sha256: e60d4e85ff2145bc921b2fb106978a3d1be505e4e19548bc8e35c45318466971

So the host-side specified-partition BUP generation path appears to work correctly for A_cpu-bootloader.

However, before applying it on the target, I re-read the r36.5 Developer Guide and noticed the section on Single Partition Capsule Update says that UEFI support for one-partition image update is disabled in the default UEFI build. The docs also say that fwupdtool and nv_bootloader_capsule_updater.sh do not support single-partition image capsule updates.

So I’m not sure what the intended next step is for actually applying this A_cpu-bootloader_only_payload on a standard r36.5 Orin NX / Nano setup.

Should I:

  1. Generate a Capsule from A_cpu-bootloader_only_payload, customize/rebuild UEFI to enable single-partition Capsule update, then manually set the required EFI variables as described in the Developer Guide?

  2. Use a different NVIDIA-supported update path for this test that does not require custom UEFI single-partition Capsule support?

  3. Instead generate a full bootloader BUP / Capsule and apply that through the normal nv_bootloader_capsule_updater.sh path?

The reason I’m asking is that the original failure I’m trying to reproduce involved a corrupted A_cpu-bootloader/UEFI-style artifact that authenticated far enough to execute, then failed later during boot with:

ERROR:   Powering off core

With BWT enabled, the board did not reset or fail over to the B chain after waiting approximately 12 minutes. I’d like to repeat the test using the update method you intended, but I want to avoid using an unsupported single-partition apply path if the default UEFI build will not process it.

Please let me know which apply/update path you want me to use after generating A_cpu-bootloader_only_payload.

hello alin.pilkington,

you should test with OTA payload package that updates the bootloader only.
for instance,
you should update $OUT/Linux_for_Tegra/bootloader/ with your corrupted image.
please execute OTA script to create the OTA payload that update bootloader only. (i.e. -b options).
for example,
$ sudo -E ./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh --external-device nvme0n1 -b jetson-orin-nano-devkit R36-5

Hi Jerry,

I checked my local Jetson Linux R36.5.0 BSP installation and the SDKM download archive.

The script from your example is not present:

tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh

I also checked the original SDKM archive:

Jetson_Linux_R36.5.0_aarch64.tbz2

It contains tools/ota_tools/version_upgrade/, but only these files:

init
nv_ota_common.func
nv_ota_disk_enc.func
nv_ota_exception_handler.sh
nv_ota_internals.sh
nv_ota_log.sh
nv_ota_utils.func
nv_recovery.sh
ota_make_recovery_img_dtb.sh
recovery_copy_binlist.txt

SDKM also did not download an ota_tools_R36.5.0_aarch64.tbz2 archive.

Can you confirm where I should obtain the R36.5.0 OTA tools package that contains l4t_generate_ota_package.sh?

Also, can you confirm whether bootloader-only image-based OTA using the -b option is supported for Jetson Linux R36.5.0 on Jetson Orin Nano/NX, or whether this requires a newer/different OTA tools package?

Thanks,
Alin

you may refer to [Jetson Linux Archive], such as.. Jetson Linux Release 36.5 | NVIDIA Developer to download the [OTA Tools] package.

Hi Jerry,

We tested the R36.5 bootloader-only OTA/capsule flow you suggested using the -b option. The board is a fused Secure Boot / ROOTFS_AB Jetson Orin Nano/NX-class setup booting from NVMe.

The package generation was based on this bootloader-only OTA flow:

sudo -E ./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh \
  --external-device nvme0n1 \
  -b \
  jetson-orin-nano-devkit \
  R36-5

In our secure-boot setup, we also supplied the PKC and UEFI key options and used the generated bootloader-only OTA payload package on target.

Before running OTA, the board was clean:

Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: A
slot 0: normal
slot 1: normal

Current rootfs slot: A
Active rootfs slot: A
rootfs slot 0: normal, retry_count 3
rootfs slot 1: normal, retry_count 3

We then ran nv_ota_start.sh on target. The script completed staging successfully and reported:

ROOTFS_AB_ENABLED=1
ROOTFS_CURRENT_SLOT=0
UPDATE_SLOT=B
UPDATE_BOOTLOADER=1, UPDATE_ROOTFS=0

trigger_uefi_capsule_update /ota_work /dev/nvme0n1
Copying /ota_work/TEGRA_BL.Cap into /opt/nvidia/esp/EFI/UpdateCapsule
Triggering UEFI capsule update ... OsIndications...
Bootloader on non-current slot(B) is to be updated once device is rebooted

After reboot, the board came back normally on chain A:

Capsule update status: 1
Current bootloader slot: A
Active bootloader slot: A
slot 0: normal
slot 1: normal

rootfs A/B both normal

So at that point, the capsule appeared to have been consumed successfully.

However, since the OTA script said non-current slot B was updated, we then tried to verify B by running:

sudo nvbootctrl set-active-boot-slot 1
sudo reboot

The serial console showed:

Rebooting to new boot chain

But the next successful boot still came up on chain A, and nvbootctrl reported:

Capsule update status: 1
Current bootloader slot: A
Active bootloader slot: A
slot 0: normal
slot 1: unbootable

Rootfs A/B was still healthy:

Current rootfs slot: A
Active rootfs slot: A
slot 0: retry_count 3, status normal
slot 1: retry_count 3, status normal

So the bootloader-only capsule update appears to have made the non-current bootloader slot B unbootable. The board recovered back to A, but B is now marked unbootable.

This looks similar to a previous failure we saw where the opposite slot became unbootable after an OTA/capsule test. Our current hypothesis is that the OTA capsule updates the non-current chain, but the updated chain does not boot correctly on this fused Secure Boot / ROOTFS_AB configuration. If the system then attempts to boot that updated chain, it fails and the bootloader slot gets marked unbootable.

Do you have any suggestions for what to inspect next?

Some specific questions:

1. Is "Capsule update status: 1" definitely the success state for this flow?

2. After a bootloader-only OTA capsule update, is the updated non-current bootloader slot expected to be immediately bootable via:

       sudo nvbootctrl set-active-boot-slot <slot>
       sudo reboot

3. Are there additional secure-boot signing/capsule certificate requirements beyond PKC and UEFI DB signing that could make the capsule apply but leave the updated slot unbootable?

4. Is there a recommended way to dump or decode the persistent boot-chain status metadata that marks a slot unbootable?

5. Would you expect a surgical reflash of B_cpu-bootloader to clear this state, or is a full reflash required to reset the persistent boot-chain metadata?

6. Should a bootloader-only OTA package for a fused Secure Boot / ROOTFS_AB NVMe boot system include any additional options or artifacts beyond the standard -b --external-device nvme0n1 flow?

We captured serial logs for the reboot and pre/post nvbootctrl state this time, so we can provide details if useful.

Thanks,
Alin