JetPack 4 to JetPack 5 update with A/B rootfs enabled without booting into recovery partition and/or without layout change

martin.herren · January 24, 2024, 3:05pm

Hej,

We got a fleet of > 1000 AGX Xaviers currently on JP4 with A/B rootfs enabled that we need to upgrade to JP5.

We already successfully remotely upgraded many of them from a 32.6.1 to a 32.7.1 BSP through OTA A/B upgrade. Now we need to upgrade to BSP 35.3.1 or 35.4.1 (the carrier board manufacturer only supports 35.3.1, we might do our own 35.4.1 support if time permits). They are all at remote locations and we cannot get physical access to them, only remote, that’s why a robust A/B mechanism with rollback is necessary.

Currently we unsuccessfully experimented doing an OTA upgrade on a AGX Xavier DevKit to validate the concept.

There are 2 problems:

The xavier accepts the ota payload and tries to reboots into recovery mode to try to install it. It fails to enter recovery mode (init not found) and thus is stuck and rendered useless until entered into forced recovery mode and reflashed from scratch.
The reboot into recovery is not acceptable and not compatible with our update software and our robustness and rollback requirements. We need to flash the inactive partition from a running system and be able to reboot directly into the new system, without going through recovery mode. This works fine between different or identical JP4 rootfs upgrades as well as between different or identical JP5 rootfs upgrades. We need the same behavior for a JP4 to JP5 upgrade. The problem is that going from JP4 to JP5 there is a layout change. Would it be possible to change the JP5 layout to fit the JP4 layout to support an upgrade without layout change ? Most partitions are the same but shuffled in a different order. I must say the JP5 layout looks more sane and future proof regarding future upgrades.

Thanks and best regards,

Martin

KevinFFF · January 25, 2024, 3:18am

Are you using AGX Xavier with rootfs in internal eMMC or external NVMe?

There must be layout change from JP4 to JP5 since the SW architecture and stack are different.

I would suggest you verify the overall process on the devkit first.
Could you share the log when you are generating the OTA package and also performing the OTA update?

martin.herren · January 26, 2024, 3:32pm

That’s very unfortunate as it defeats the whole A/B purpose to have a robust and fail safe upgrade path including rollback.

Currently for JP4→JP4 as well as JP5→JP5 upgrades we have a very robust process. If anything during the flashing fails the running partition is not altered and the system continues to work.
If the flash succeeds but for some reason doesn’t boot, the rollback mechanism brings back a sane state with everything running in the previous version.

Currently it fails during the recovery step, leaving the whole system bricked until reflashed from scratch. Rollback doesn’t work.

That’s exactly what we are doing now as the carrier board manufacturer doesn’t support OTA upgrades at all and there might be additional surprises (we solved them all for JP4→JP4 upgrades already).

Yes, i’ll regenerate a clean OTA upgrade begin of next week and post the generation logs as well as the logs when applying the upgrade.

Thanks and best regards,

/Martin

martin.herren · January 29, 2024, 5:27pm

Ok, went some further.

The initial issue of failing to boot into recovery due to an initramfs was due to an error in the build_base_recovery_image.sh arguments, one path that should have been base_bsp was target_bsp.

As we only did JP4 to JP4 and JP5 to JP5 upgrades with A/B we never used the recovery image so the error was unnoticed until now (the rootfs and ota package generation is automated). Fixing that leads to a booting recovery.

Now we get stuck at


/init: line 68: modprobe: command not found[    7.605782] Root device found: initrd

[    7.615564] hpd: switching from state 1 (Check Plug) to state 3 (Disabled)
[    7.617736] Mount initrd as rootfs and enter recovery mode
Finding OTA work dir on external storage devices
Checking whether device /dev/mmcblk?p1 exist
Looking for OTA work directory on the device(s): /dev/mmcblk0p1
Checking whether device /dev/sd?1 exist
Device /dev/sd?1 does not exist
Checking whether device /dev/nvme?n1p1 exist
Looking for OTA work directory on the device(s): /dev/nvme0n1p1
mount /dev/nvme0n1p1 /mnt
[    7.679236] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
is_boot_only_partition /mnt
OTA work directory /mnt/ota_work is not found on /dev/nvme0n1p1
Finding OTA work dir on internal storage device
mount /dev/mmcblk0p1 /mnt
[    7.745623] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null)
is_boot_only_partition /mnt
OTA work directory /mnt/ota_work is not found on /dev/mmcblk0p1
OTA work directory is not found on internal and external storage devices
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash-4.4#
bash-4.4#

The error seems to come from the fact that in our case mmcblk0p1/2’s /ota_work is a symlink to a subdir of nvme0n1p1.
As we are running A/B Rootfs mmcblk0p1/2 is only half the size and too small to hold the upgrade. Thus we used a symlink.

From the logs it seems that the upgrade script looks both at the root of mmcblk0p1 and nvme0n1p1, so we’ll look if there is a way to specify a subfolder to look into on nvme0n1p1 rather than the root.
Otherwise i’ll patch the update code to use an ota_work folder at the root of nvme0p1.

Then i’ll see how far we come.

As now i have a shell in the recovery partition, is there a way from there to reboot to the current running system (boot on mmcblk0p1) ? I tried to set slot 0 as bootable through nvbootctrl but it is not available in the recovery system. As the upgrade has not yet been applied it would be nice to be able to rollback to the working system instead of reflashing it.

Thanks and best regards.

KevinFFF · January 30, 2024, 5:54am

Image-based OTA will perform for unused slot currently.
For example, it you are booting from slot A, and it will update slot B after reboot and boot from slot B after update. It seems you have external NVMe connected so that you could just boot from NVMe and put your OTA update payload into it before update.

You are in recovery kernel. Please remove the OTA payload and run reboot to UEFI menu and select booting from NVMe drive.

martin.herren · January 30, 2024, 10:46am

Yes, that’s exactly what we want.

Yes, we have an NVMe but only for shared and persistent data between A/B. We don’t boot on NVMe.

We’d like to reboot on slot 0 (or slot 1) of the internal storage but it always goes into recovery:

[0008.654] I> ########## Fixed storage boot ##########
[0008.659] I> Loading kernel-bootctrl from partition
[0008.664] I> Loading partition kernel-bootctrl at 0xa0700000 from device(0x1)
[0008.677] I> A/B: bin_type (50) slot 0
[0008.677] I> Loading recovery from partition
[0008.679] I> Loading partition recovery at 0xa0700000 from device(0x1)
[0009.071] I> Validate recovery ...
[0009.071] I> T19x: Authenticate recovery (bin_type: 50), max size 0x5000000
[0009.503] I> Encryption fuse is not ON
[0009.519] I> Checking boot.img header magic ... [0009.520] I> [OK]
[0009.520] I> A/B: bin_type (51) slot 0
[0009.520] I> Loading recovery-dtb from partition
[0009.520] I> Loading partition recovery-dtb at 0x91000000 from device(0x1)
[0009.528] I> Validate recovery-dtb ...
[0009.528] I> T19x: Authenticate recovery-dtb (bin_type: 51), max size 0x400000
[0009.532] I> Encryption fuse is not ON
[0009.533] I> Kernel hdr @0xa0700000
[0009.533] I> Kernel dtb @0x91000000
[0009.536] I> decompressor handler not found
[0009.540] I> Copying kernel image (34484232 bytes) from 0xa0700800 to 0x80080000 ... [0009.556] I> Done
[0009.556] I> Move ramdisk (len: 12902618) from 0xa27e4000 to 0x92000000
[0009.561] I> Updated bpmp info to DTB
[0009.562] I> Ramdisk: Base: 0x92000000; Size: 0xc4e0da
[0009.564] I> Updated initrd info to DTB
[0009.568] W> WARN: Fail to override "console=none" in commandline
[0009.574] I> Active rootfs suffix: 
[0009.577] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params
[0009.585] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params
[0009.592] E> tegrabl_linuxboot_add_disp_param, du 0 failed to get display params
[0009.599] I> Active slot suffix: 
[0009.602] I> add_boot_slot_suffix: slot_suffix = 
[0009.607] I> Linux Cmdline: console=ttyTCU0,115200 root=/dev/initrd rw rootwait console=ttyTCU0,115200n8 fbcon=map:0 net.ifnames=0 video=tegrafb no_console_suspend=1 earlycon=tegra_comb_uart,mmio32,0x0c168000 b
ase_version=R32-6 target_board=jetson-agx-xavier-devkit  video=tegrafb earlycon=tegra_comb_uart,mmio32,0x0c168000 gpt rootfs.slot_suffix= usbcore.old_scheme_first=1 tegraid=19.1.2.0.0 maxcpus=8 boot.slot_suffix=
 boot.ratchetvalues=0.4.2 vpr_resize sdhci_tegra.en_boot_part_access=1

What do you mean by remove OTA payload and how to do it ?
In the menu I select booting from emmc but it still boots into the emmc’s recovery partition and not into emmc’s slot 0.

KevinFFF · January 31, 2024, 7:34am

Did it boot into recovery kernel after you flash the board?
Or it is caused from any operation from you? (like you put OTA payload to perform update)

Please share the screenshot when you press ESC to enter UEFI menu → Device Manager → NVIDIA Configuration → L4T Configuration

martin.herren · February 8, 2024, 3:28pm

After running nv_ota_start.sh it seems successfully, i rebooted and it went into recovery. From there the upgrade failed.

So as nothing upgraded yet, i’d wanted to know if there is a possibility to return to the current installed system on slot 0.

As the bootloader at this point is still in JetPack4, there is no UEFI menu to enter.

martin.herren · February 8, 2024, 4:31pm

Current boot into recovery, after having changed the /ota_work symlink to the root of /dev/nvme1n1p1 instead of a subfolder:

[    7.674774] Mount initrd as rootfs and enter recovery mode
Finding OTA work dir on external storage devices
Checking whether device /dev/mmcblk?p1 exist
Looking for OTA work directory on the device(s): /dev/mmcblk0p1
Checking whether device /dev/sd?1 exist
Device /dev/sd?1 does not exist
Checking whether device /dev/nvme?n1p1 exist
Looking for OTA work directory on the device(s): /dev/nvme0n1p1
mount /dev/nvme0n1p1 /mnt
[    7.736020] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
is_boot_only_partition /mnt
Set rootfs=/dev/nvme0n1p1
Set dm_crypt=
OTA task runner nv_ota_run_tasks.sh is not found
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash-4.4#

It seems to look for a nv_ota_run_tasks.sh under /mnt/ota_work. There must be a issue with my payload as my /mnt/ota_work folder has

bash-4.4# ls /mnt/ota_work/
Linux_for_Tegra  ota_work

Under there there are in fact 2 nv_ota_run_tasks.sh:

/mnt/ota_work/Linux_for_Tegra/tools/ota_tools/version_upgrade/nv_ota_run_tasks.sh
/mnt/ota_work/ota_work/nv_ota_run_tasks.sh

The command used for the base recovery image:

./tools/ota_tools/version_upgrade/build_base_recovery_image.sh jetson-agx-xavier-devkit R32-6 /data/flash/jetson-agx-xavier-devkit-erx-1.6.0/Linux_for_Tegra /data/flash/jetson-agx-xavier-devkit-erx-1.6.0/Linux_for_Tegra/rootfs /ssd/nvidia/images/35.4.1/jetson-agx-xavier-devkit/Linux_for_Tegra

The command for the ota package:

./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh -s -S 14GiB -o rootfs_updater.sh -f image.tar.gz jetson-agx-xavier-devkit R32-6

KevinFFF · February 22, 2024, 6:30am

Could you perform OTA update from R32.6.1 to R35.4.1 on the devkit (for eMMC) to verify the overall workflow for image-based OTA?
Currently, image-based OTA for NVMe is not supported for AGX Xavier yet and it would be supported from next release (might be R35.5.0).

martin.herren · February 22, 2024, 7:53am

Yes, that’s exactly what we did. All our AGX Xavier run only on eMMC and all current OTA tests are done on a DevKit. The SSD is only for persistent storage, not for the OS.

The only thing related to the NVMe on this issue is that the /ota_work folder is on the NVMe and not on the eMMC due to not enough eMMC flash size.

Any estimation when 35.5.x will be released, it was originally announced for December 2023 ?

KevinFFF · February 23, 2024, 3:57am

You have to flash NVMe to use it as rootfs not only just specify it in extlinux.conf.

Jetson Linux 35.5.0 has just been released, please give it a try.

martin.herren · February 23, 2024, 6:48am

No, we don’t want to use it as rootfs, both A/B rootfs’ are on the eMMC. It is the same for BSP 32…x and BSP 35.x.x

The only thing where the NVMe is used, is that the /ota_work on the eMMC is not a folder but a symlink to a folder with the same name on the NVMe.

KevinFFF · February 23, 2024, 9:15am

but from your log as following…

Are you creating that symlink manually?
It seems not the official steps instructed from our document.

If you just want to perform image-based OTA for eMMC with rootfs a/b enabled, please create /ota_work and put OTA package on eMMC.

martin.herren · February 23, 2024, 9:45am

No.

Set rootfs=/dev/nvme0n1p1 comes from the log output from your tools.
For me it just means that your tool successfully found the ota_work directory on nvme0n1p1 which is correct and expected. Beside that nothing related to the system on the NVMe.

Yes we create the symlink from /ota_work on the eMMC to /ota_work on the NVMe. This is required as with A/B enabled we only have half of the eMMC size which is too small to hold the rootfs + the payload of the new rootfs.

No, eMMC is not an option due to the size. Everything works fine with JP4 to JP4 upgrades as well as JP5 to JP5 upgrades. There must just be a little detail to figure out why JP4 to JP5 doesn’t work.

Guess we’ll need to figure it out ourselves.

martin.herren · February 23, 2024, 3:10pm

Seems i finally found the issue, due to some error in our update scripts the ota payload ended up unpacked under /ota_work/ota_work on the NVMe instead under /ota_work. That didn’t cause any troubles for JetPack 4.x to JetPack 4.x or JetPack 5.x to JetPack 5.x updates (both without layout change) but fails for JetPack 4 to JetPack 5 due to the layout change and the intermediate reboot into recovery.

Generating new artifacts and testing.

martin.herren · February 23, 2024, 3:19pm

Great ! Will probably directly use this one.

KevinFFF · February 26, 2024, 5:20am

If your OTA payload is larger than the rootfs partition on your board, you may also get not enough space for it to be flashed. In your case, I would suggest flashing NVMe and use it as rootfs for more storage.

For Jetpack 4 to Jetpack 5, it includes the partition layout change.

martin.herren · February 26, 2024, 3:07pm

Thanks for your reply. The OTA payload is not too big to be flashed onto the partition, just the flash is not big enough to hold the currently installed rootfs + the OTA payload of the new rootfs to flash.

Thanks for guiding us into the correct directions, indeed the problem came from the /ota_work/ symlink and the underlying structure.

First error was not to have the /ota_work symlink pointing to the root of the NVMe but to a subfolder. The 2nd error was that even when on the root of the NVMe the extracted payload was in a subdirectory. this works fine for upgrades without layout change but won’t work for an upgrade with layout change.

Once the /ota_work symlink properly points to a ota_work directory at the root of the NVMe and the payload is directly at the root of this folder it works fine applying the update and reboots into JetPack 5.

Now we face a new issue with partition B’s extlinux.conf containing PARTUUID of partition A as root. I’ll open a new thread for this specific issue here.

Thanks for your guidance to identify the problems so far.

system · March 12, 2024, 7:38am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
During OTA update: No space left on device Jetson AGX Orin security , ota	25	721	May 21, 2024
RootFS OTA update with A/B redundancy boots with wrong rootfs partition Jetson AGX Xavier ota	33	3887	November 24, 2021
ROOTFS AB OTA for SSD with Increased size on Xavier NX emmc Jetson Xavier NX security , ota	53	122	March 17, 2025
Image based OTA from 32.7.3 to 35.4.1 failed due to cbo.dtb file missed Jetson Xavier NX jetpack , ota	23	805	March 26, 2024
OTA tools: broken support of AGX Xavier 64Gb Jetson AGX Xavier nvbugs , ota	18	56	April 7, 2025
Image-based OTA from 32.6.1 to 32.7.2.The upgrade result indicated that the upgrade succeeded but the startup failed Jetson Xavier NX ota	69	1774	October 9, 2023
AGX Xavier image based Ota update process never starts from nvme Jetson AGX Xavier ota	23	677	November 29, 2023
OTA update with R35.5.0 [PKC+SBK + Disk encryption enabled] Jetson Xavier NX security , ota	15	653	April 17, 2024
Is it possible to set up 5.0.2 boot from nvme by only flashing eMMC first? Jetson Xavier NX reflash , board-design , nvme	28	2640	January 28, 2023
RT Kernel Jetpack 6 Issue Jetson AGX Orin kernel , preempt_rt	58	1556	December 15, 2023

JetPack 4 to JetPack 5 update with A/B rootfs enabled without booting into recovery partition and/or without layout change

Related topics