Error on Coldboot after flashing on Jetson AGX Orin: "E> Top caller module: DRAM_ECC, error module: PARTITION_MANAGER, reason: 0x0d, aux_info: 0x09"

I am working on a Jetson AGX Orin Industrial Dev Kit that is using Secure Boot with partitioning and disk encryption. Currently, I am running everything from the Jetpack version 35.5.0. It has been fused with an RSA, SBK and OEM_K2 key and seems to complete flashing although on coldboot, I run into the following error on the UART log.
UART_log.txt (21.1 KB)

No changes have been made to the base keys in the example.sh file used to generate the eks_t234.img except the inclusion of the fused OEM_K2 key. I regenerate the eks_t234.img and move it to the bootloader directory and run the flash command. The flashing log says secure flashing complete but subsequently runs into an issue on coldboot. I have also provided the UART log for flashing
UART_flash_log.txt (17.9 KB)

Update:

Note that I generated the eks_t234.img from the following command

python3 gen_ekb.py -chip t234 -oem_k2_key oem_k2_optee.key -fv_ekb_t234 -in_sym_key sym_t234.key -in_sym_key2 sym2_t234.key -in_auth_key auth_t234.key -out eks_t234.img

where fv_ekb_t234, sym_t234.key, sym2_t234.key and auth_t234.key are all the baseline keys defined in example.sh here

#!/bin/bash

# [T194 example]
# This is default KEK2 root key for unfused board
#echo "00000000000000000000000000000000" > kek2.key

# This is the fixed vector for deriving EKB root key from fuse.
# It is expected user to replace the FV below with a user specific
# FV, and code the exact same user specific FV into OP-TEE.
#echo "bad66eb4484983684b992fe54a648bb8" > fv_ekb_t194

# Generate user-defined symmetric key files
# For each key, uncomment the random generate key and comment out the next line for production
# openssl rand -rand /dev/urandom -hex 16 > sym_t194.key
#echo "00000000000000000000000000000000" > sym_t194.key
# openssl rand -rand /dev/urandom -hex 16 > sym2_t194.key
#echo "00000000000000000000000000000000" > sym2_t194.key
# openssl rand -rand /dev/urandom -hex 16 > auth_t194.key
#echo "00000000000000000000000000000000" > auth_t194.key

#python3 gen_ekb.py -chip t194 -kek2_key kek2.key \
#        -fv fv_ekb_t194 \
#        -in_sym_key sym_t194.key \
#        -in_sym_key2 sym2_t194.key \
#        -in_auth_key auth_t194.key \
#        -out eks_t194.img

# [T234 example]
# Fill your OEM_K1 fuse key value
echo "0000000000000000000000000000000000000000000000000000000000000000" > oem_k1.key


# This is the fixed vector for deriving EKB root key from fuse.
# It is expected user to replace the FV below with a user specific
# FV, and code the exact same user specific FV into OP-TEE.
echo "bad66eb4484983684b992fe54a648bb8" > fv_ekb_t234

# Generate user-defined symmetric key files
# For each key, uncomment the random generate key and comment out the next line for production
#openssl rand -rand /dev/urandom -hex 32 > sym_t234.key    # kernel/kernel-dtb encryption key
#echo "0000000000000000000000000000000000000000000000000000000000000000" > sym_t234.key
#openssl rand -rand /dev/urandom -hex 16 > sym2_t234.key   # disk encryption key
echo "f0e0d0c0b0a001020304050607080900" > sym2_t234.key
#openssl rand -rand /dev/urandom -hex 16 > auth_t234.key   # uefi variables authentication key
echo "d9f7b49e3b6264985f1326f541bb43c9" > auth_t234.key

python3 gen_ekb.py -chip t234 -oem_k1_key oem_k1.key \
        -fv fv_ekb_t234 \
        -in_sym_key sym_t234.key \
        -in_sym_key2 sym2_t234.key \
        -in_auth_key auth_t234.key \
        -out eks_t234.img
 #       -in_device_id device_id_cert.der \
 #       -in_ftpm_sn 00000000000000000000 \
 #       -in_ftpm_eps_seed ftpm_eps_seed_file \
 #       -in_ftpm_rsa_ek_cert ftpm_rsa_ek_cert.der \
 #       -in_ftpm_ec_ek_cert ftpm_ec_ek_cert.der \

I have discovered that I am able to flash but only if I exclude disk encryption all together (i.e. excluding ROOTFS_ENC=1 and -i “ekb.key” in the flashing command). This is the line that allowed me to flash at least with secureboot enabled with the fused device

sudo ROOTFS_AB=1 ./flash.sh -u rsa_priv.pem -v sbk.key jetson-agx-orin-devkit-industrial mmcblk0p1

Is there something in the way I am generating the eks_t234.img that hindering me to do disk encryption?

Update 2:

The issue seems to be that I cannot partition the Orin and encrypt it at the same time. I was now able to flash when running the command

sudo ROOTFS_ENC=1 ./flash.sh -u rsa_priv.pem -v sbk.key -i "./ebk.key" jetson-agx-orin-devkit-industrial mmcblk0p1

however when running

sudo ROOTFS_ENC=1 ROOTFS_AB=1 ./flash.sh -u rsa_priv.pem -v sbk.key -i "./ebk.key" jetson-agx-orin-devkit-industrial mmcblk0p1

I get the same issue shown at the end of the UART_log in my first post. Is there an issue with doing both processes at once of something wrong in the steps I have taken?

UART_coldboot_log_only_encryption.txt (102.3 KB)

Hi grant.kellogg,

In your latest update, it seems just different from adding ROOTS_AB=1 or not.
It would result in using different partition layout XML.
Could you share flash log in both cases?

Have you confirmed that you replace this line with your OEM K1 key?

Sorry, I realized I made a typo in my second update. I am only able to flash when doing either ROOTFS_ENC=1 or ROOTFS_AB=1 separately, but not together. I have edited my post.

I am using gen_ekb.py to generate the eks_t234.img and am using oem_k2 instead of oem_k1. I will post the flash logs hers shortly

Seems like my flash logs did not get save. I will incrementally post them as I reflash the Orin. For now, here is the flash log from running this command:

sudo ROOTFS_ENC=1 ./flash.sh -u rsa_priv.pem -v sbk.key -i "./ebk.key" jetson-agx-orin-devkit-industrial mmcblk0p1

flash_log_only_encryption_and_secureboot.txt (175.7 KB)
I have removed any --key, --iv and --aad information from it

Flash log for the command:

sudo ROOTFS_ENC=1 ROOTFS_AB=1 ./flash.sh -u rsa_priv.pem -v sbk.key -i "./ebk.key" jetson-agx-orin-devkit-industrial mmcblk0p1

flash_log_partition_and_encryption.txt (176.4 KB)
I have removed any --key, --iv and --aad information from it

And finally, the flash log for the command:

sudo ROOTFS_AB=1 ./flash.sh -u rsa_priv.pem -v sbk.key jetson-agx-orin-devkit-industrial mmcblk0p1

I have removed any --key, --iv and --aad information from it

The only major difference I am seeing if the different .xml files used when flashing, namely: flash_t234_qspi_sdmmc_industrial_enc_rfs.xml, flash_t234_qspi_sdmmc_industrial_enc_rootfs_ab.xml and flash_t234_qspi_sdmmc_industrial_rootfs_ab.xml.

I am starting the look at the differences between all three that would lead to the errors

E> Cannot find partition bad-page
E> Partition bad-page not found
E> Failed to access prl
C> Task 0x0 failed (err: 0x504d090d)
E> Top caller module: DRAM_ECC, error module: PARTITION_MANAGER, reason: 0x0d, aux_info: 0x09

in the UART_log.txt after flashing with both rootfs A/B and disk encryption enabled. I have included the files here as well
flash_t234_qspi_sdmmc_industrial_enc_rfs.txt (32.0 KB)
flash_t234_qspi_sdmmc_industrial_enc_rootfs_ab.txt (33.6 KB)
flash_t234_qspi_sdmmc_industrial_rootfs_ab.txt (32.0 KB)

Do you mean that you cannot flash the board successfully when you add both ROOTFS_ENC=1 and ROOTFS_AB=1?
If so, why your flash_log_partition_and_encryption.txt shows the Secure Flashing completed at the end?

May I know what is your requirement?
Do you want to enable disk-encryption(ROOTFS_ENC=1)?
Do you want to enable rootfs a/b (ROOTFS_AB=1)?
then we can focus on your use case.

Sorry for the confusion. I want to enable disk encryption and rootfs a/b but am unable to. I am able to flash in all three cases, but the device will not successfully reboot after flashing if I add both ROOTFS_ENC=1 and ROOTFS_AB=1

Get it.
Have you modified the partition layout file? Or you use the old one from previous release on R35.5.0?
It seems you lost something in partition layout file.

Please modify the following line in <Linux_for_Tegra>/bootloader/t186ref/cfg/flash_t234_qspi_sdmmc_industrial_enc_rootfs_ab.xml

- <partition name="A_cpu-bootloader" type="bootloader_stage2" oem_sign="true">
+ <partition name="A_cpu-bootloader" type="bootloader_stage2" oem_sign="true" compress="true" comp_algo="lz4"> 

- <partition name="B_cpu-bootloader" type="bootloader_stage2" oem_sign="true">
+ <partition name="B_cpu-bootloader" type="bootloader_stage2" oem_sign="true" compress="true" comp_algo="lz4">

and run the following command to flash the devkit to verify.

$ sudo ROOTFS_ENC=1 ROOTFS_AB=1 ./flash.sh -u rsa_priv.pem -v sbk.key -i "./ebk.key" jetson-agx-orin-devkit-industrial mmcblk0p1

Unfortunately, that change still produced the error I was getting on cold reboot. Could it be something with instead?

<partition name="A_BADPAGENAME" type="BADPAGETYPE">
<allocation_policy> sequential </allocation_policy>
<filesystem_type> basic </filesystem_type>
<size> 524288 </size>
<file_system_attribute> 0 </file_system_attribute>
<allocation_attribute> 8 </allocation_attribute>
<percent_reserved> 0 </percent_reserved>
<filename> BADPAGEFILE </filename>
<align_boundary> 65536 </align_boundary>
<description> **Required.** Chain A; contains BADPAGE BLOB binary. </description>
</partition>
<partition name="B_BADPAGENAME" type="BADPAGETYPE">
<allocation_policy> sequential </allocation_policy>
<filesystem_type> basic </filesystem_type>
<size> 524288 </size>
<file_system_attribute> 0 </file_system_attribute>
<allocation_attribute> 8 </allocation_attribute>
<percent_reserved> 0 </percent_reserved>
<filename> BADPAGEFILE </filename>
<align_boundary> 65536 </align_boundary>
<description> **Required.** Chain B; contains BADPAGE BLOB binary. </description>
</partition>

Where you get the partition layout file you shared?

It seems the different one from our R35.5.0 release.

It was the one I downloaded from here: https://developer.nvidia.com/downloads/embedded/l4t/r35_release_v5.0/release

Would you be able to provide the file you are referring to?

I cannot open your link with successful.
Please download BSP package from Jetson Linux 35.5.0 | NVIDIA Developer.
Or you can use SDK Manager to download/flash it for the devkit.

This ended up fixing my issue, thank you for the help.