Orin 5.1.3 eMMC fails to boot into 5.1.1 NVMe

On an Orin AGX with a custom carrier board booted to the eMMC, with 5.1.1 installed on the eMMC and external NVMe, I performed an OTA update to 5.1.3 on the eMMC and found myself subsequently unable to boot into the NVMe installed with 5.1.1.

EMMC_5_1_3_NVME_5_1_1_NO_BOOT_2.log (147.0 KB)
The log attached shows the Orin rebooting, NVMe being selected in the UEFI boot manager, a failed boot into successfully booting into the eMMC.
If I re-flash the eMMC with 5.1.1, I can successfully boot in the NVMe.

Is this expected behaviour?

We’re in the process of integrating the OTA update process into our product and need some clarity over what operations are supported and the best order in which to do them. We were intending to perform an OTA update on each boot device one at a time so that if an issue occurs, we still have the other disk available to fall back to.
If this is not intended, please advise how we can avoid this issue.
If if is, what is the recommended approach to updating the non-current boot device?
Can you enlighten me as to how this is different from booting into the B partition of an A/B device after updating the A partition via OTA? In that scenario, the B bootloader would not match the L4T version of the firmware either (which we suspect is the cause).

Thanks,
Connor

Hi connor.timmins,

If you flash eMMC with JP5.1.3 but flash NVMe with JP5.1.1, it is expected that it can not boot from NVMe SSD since bootloader of JP5.1.3 may not work with rootfs of JP5.1.1.
In this case, we would suggest performing image-based OTA update for NVMe from JP5.1.1 to JP5.1.3.

[    6.703250] Root device found: mmcblk0p1
[    6.707713] Found dev node: /dev/mmcblk0p1
[    6.733936] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
[    6.737117] Rootfs mounted over mmcblk0p1

From the log you shared, it seems it is booting from internal eMMC and mount mmcblk0p1 at rootfs.

May I know what’s the use case for your NVMe SSD?
Have you also flashed it with JP5.1.1 before?

Do you perform image-based OTA update for internal eMMC or external NVMe SSD?

For AGX Orin, there are QSPI and eMMC in the module.
Bootchain A/B is enabled by default for booting purpose and they are stored in QSPI, which includes many partitions for booting.
Once your board boots from either A or B slot, it can boot into eMMC or NVMe SSD (selected by UEFI menu).
After kernel boots, it would mount the partition according to how you configure in /boot/extlinux/extlinux.conf.

Hi Kevin,

In this case, we would suggest performing image-based OTA update for NVMe from JP5.1.1 to JP5.1.3.

From what I can gather from your documentation, we need to boot into the target device to OTA update it, making this not possible. Our goal is to OTA update this device to keep parity between the two but we find ourselves unable to.

From the log you shared, it seems it is booting from internal eMMC and mount mmcblk0p1 at rootfs.

If you look further back in the logs, you can see;

Jetson UEFI firmware (version 202210.4-a5ac12d7-dirty built on 2024-09-25T10:13:15+00:00)

appearing twice. The first is an attempt to boot into the NVMe, which fails with this error

��E/TC:?? 00 get_rpc_alloc_res:645 RPC allocation failed. Non-secure world result: ret=0xffff0000 ret_origin=0
E/LD:   init_elf:486 sys_open_ta_bin(bc50d971-d4c9-42c4-82cb-343fb7f37896)
E/TC:?? 00 ldelf_init_with_ldelf:131 ldelf failed with res: 0xffff000c
��

The UEFI then loads again and boots into the eMMC successfully, which are the lines you noted. Please note that I do not have Secure Boot enabled.

We use the NVMe as a second boot drive for boot redundancy, durability and store user data. Yes, it is a known good image and this behaviour occurs on a devKit with stock settings.

I performed an OTA update on the eMMC to 5.1.3. I have confirmed that this occurs with an OTA update or a direct flash. I have also recreated this on an AGXOrin devkit.

I’m familiar with the A/B configuration of the NOR memory present on the module, configured via QSPI and the general boot process. In an A/B OTA update, where we update the A partitions on the drive, is only the A firmware on the module updated? Whereas, in a non A/B configuration, both A/B NOR components are simultaneously updated?
If so, this would explain it.

However, that leads me back to this.

We require a supported mechanism to update the non-current boot device safely. We know, from experience, that once the required firmware is in place, we can copy over an existing disk image but this poses an issue.
If an OTA update on the first of two devices fails catastrophically after updating the firmware, for example if the rootfs is corrupted, this would leave us with a unbootable device that would have to be RMAd back to us.

Could you help to confirm if my understanding for your status is correct as following?

1. flash eMMC with JP5.1.1
2. flash NVMe with JP5.1.1
3. both eMMC and NVMe work as expected
4. perform image-based OTA for eMMC from JP5.1.1 to JP5.1.3
5. boot from eMMC successful
6. switch boot into NVMe through UEFI menu FAILED (as log shows)
7. flash eMMC with JP.5.1.1
8. both eMMC and NVMe works as expected

Please note that image-based OTA would only update non-current slot.
e.g. if you boot from slot A and perform OTA update, it will update the content in slot B and boot from slot B after update.
As a result, you can perform OTA update twice to update both slots.


Do you enable rootfs a/b in your use case? Or only bootchain a/b enabled by default?

Acutually, there’s recovery image available when the rootfs is corrupted.
i.e. you can just boot into recovery kernel and fix the corrupted rootfs manually.

Yes, this flow is correct. It also occurs if step 4 is a direct flash.

Yes, this allows us to update both slots on one device. In our current product, we do not have A/B RootFS redundancy but we do have two drives.
Is it possible to select which of the pre-drive bootchains is active without using A/B rootFS? Either through nvbootctrl or EFI variable manipulation? For example, boot into the eMMC on the A bootchain and OTA update the eMMC and only the A bootchain, then boot into the NVMe on the B bootchain which is still on 5.1.1 firmware so that we can update the NVMe and B bootchain?

Our devices are often in offline deployments and access to a serial connection is limited so recovery mode is effectively unrecoverable for us and an RMA is required.

If the answer to my question RE selecting pre-drive bootchain+updating one at a time is no, it would appear that using two boot drives is effectively unsupported as we cannot update them both through official mechanisms. We can engineer our own solution but we want to make sure we’re not missing anything.
Thanks for your assistance.

Do you mean internal eMMC and external NVMe?

Bootchain A/B is not relating to boot device(eMMC or NVMe).
Bootchain A/B is from internal QSPI and boot devices are selected by bootloader(UEFI).
Image-based OTA would not only update the data in boot device but also update QSPI.
This is why Step 4 in the flow causes the boot fail in Step 6.

I have 2 suggestions for your use case as following:

Suggestion 1: Use only NVMe
Since the capacity of NVMe could be much larger than internal eMMC, many users would not care about internal eMMC if they have NVMe connected. Bootchain a/b is enabled by default so that you won’t worry about the board can not boot due to OTA update/other operations in current slots. If you want to perform OTA update, the OTA payload would update both internal QSPI and external NVMe at the same time. The update process and the flow would be simpler. You can just refer to Image_based_OTA_Examples.txt for detailed steps and commands.

Suggestion 2: Use both NVMe and eMMC
Please note that bootchain A/B is not relating to boot device so that you can not tight A to eMMC and B to NVMe as you think. If you really want to use both internal eMMC and external NVMe and keep them updated/bootable, please refer to the following steps to verify. This flow is much more complicated since you should perform OTA update for eMMC and NVMe seperately.

1. flash eMMC with JP5.1.1
2. flash NVMe with JP5.1.1
3. both eMMC and NVMe work as expected
4. Boot from NVMe
5. perform image-based OTA with rootfs only(by adding -r option during generating OTA payload) for NVMe from JP5.1.1 to JP5.1.3
6. Boot from eMMC
7. perform image-based OTA for eMMC from JP5.1.1 to JP5.1.3
8. Check if both eMMC and NVMe can work as expected