Xavier NX production boot chain selection issues

Hey Jetson team,

We have a product that uses Xavier NX with a custom carrier and was initially deployed with jetpack 4.x using a custom yocto based image and leveraging meta-tegra for our jetpack compatibility and are having some issues with in-field updates; specific related to the boot chain/slot selection.

Unfortunately it has come to light that we need to update some of the boot firmware components and we’re having some troubles getting that done. We can confirm that the updated components are installed to the “_b” boot chain by using rcm mode with our private keys but we can’t seem to switch to that boot chain. MB1 appears to be fully silent (which I suppose is intended with production mode) so we can’t tell why we’re always starting in chain 0 once MB2 starts printing. Both tegra-boot-control and nvbootctrl appear to update the SMD partition in the same way (setting slot 1 to higher priority) but after reboot that is always flipped back and still marked unsuccessful.

In the default state we have:

# ./nvbootctrl dump-slots-info
Current bootloader slot: A
Active bootloader slot: A
magic:0x43424e00,             version: 3             features: 1             num_slots: 2
slot: 0,             priority: 15,             suffix: _a,             retry_count: 7,             boot_successful: 1
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 1

After we flip the boot chain we see this before rebooting:

# ./nvbootctrl set-active-boot-slot 1
# ./nvbootctrl dump-slots-info
Current bootloader slot: A
Active bootloader slot: B
magic:0x43424e00,             version: 3             features: 1             num_slots: 2
slot: 0,             priority: 14,             suffix: _a,             retry_count: 7,             boot_successful: 1
slot: 1,             priority: 15,             suffix: _b,             retry_count: 7,             boot_successful: 0

And then after rebooting we have:

# ./nvbootctrl dump-slots-info
Current bootloader slot: A
Active bootloader slot: A
magic:0x43424e00,             version: 3             features: 1             num_slots: 2
slot: 0,             priority: 15,             suffix: _a,             retry_count: 7,             boot_successful: 1
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 0

The MB2 output that makes me suspicious the SMD info isn’t being read correctly is:

[0000.642] W> No valid slot number is found in scratch register
[0000.642] W> Return default slot: _a                                                        
[0000.642] I> Active Boot chain : 0

There are a few things we’ve confirmed when testing this:

  • even if the slot 0 and 1 data is the same (as it is at the time of fusing) we cannot switch chains
  • We’ve confirmed that on non-fused units the process works as expected (using an unsigned and unencrypted BUP)
  • We’ve confirmed that if we dangerously update the current slot 0 with the new BUP the system can boot up successfully and we can see the changes applied

Is there something that needs to be done to switch boot chain on production fused units when the workflow works fine for non-production units? Is it possible we need to encrypt the SMD partitions with the SDK to get it to be accepted?

Hi,

If available, please try version like JP 5.x .

We verify we can change boot chain in xavier nx in r35.5.0

sudo nvbootctrl dump-slots-info
Current version: 35.5.0
Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal

$ sudo nvbootctrl set-active-boot-slot 1
$ sudo nvbootctrl dump-slots-info
Current version: 35.5.0
Capsule update status: 0
Current bootloader slot: A
Active bootloader slot: B
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal

$ sudo reboot
$ sudo nvbootctrl dump-slots-info
Current version: 35.5.0
Capsule update status: 0
Current bootloader slot: B
Active bootloader slot: B
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal

Thanks

I don’t believe this is a feasible option for fielded customer units running Jetpack 4.x, without an ability to safely switch the boot chain due to this issue. Also consider that the flash layout changed to accommodate UEFI firmware for JP5+ as well so a process for field upgrades seems complicated. We’re looking for ways we can help resolve this with the most minimal changes and interruption to customers as possible, a full software stack upgrade and conversion of units to UEFI firmware/boot would be a very large (non-minimal) change.

If there are suggestions/options to try to overlay or cherrypick back certain Jetpack 5.x components into 4.x we can try that in our lab with a unit similar to the customer or if you have any other insight as to why this would only affect production fused units we can help try to workaround that as well.

Are you internally able to reproduce the issue we’re seeing with Jetpack 4.x or is there a known issue with Jetpack 4.x that we should be aware of?

Hi,

We will set up JP 4.6.6 (L4T 32.7.6) and attempt to reproduce the issue on our Xavier NX devkit.

Thanks

Hi @anthony.squires

Could you please share the step-by-step commands you used so we can reproduce the issue, including:

  • The commands to prepare the BSP

  • The commands to flash your device

  • The commands to change the boot chain

We’ll execute them one by one on our fused devkit to verify if we can replicate the issue.

Thanks.

For repro I would suggest running standard JetPack 4.6.6 on the fused unit and making sure that the nvbootctrl can switch as expected for non-rootfs partitions (specifically looking at the MB2 output where it thinks the scratch register is unset). Given that our unfused devices can perform the operation as expected I’m struggling to see what the difference would be if not some fused behaviour or something in the BCT configs.

Could you please answer the questions that we have asked as well?

  • What are the differences between fused and unfused units when it comes to switching boot chains?
  • Is it expected the mb1 output is silenced on fused units?
  • Is there any known bug in Jetpack 4 that this could be related to?

If you really need the details, as this is part of OE4T there’s a few extra differences with the process and because we use a share OTA process across all our devices (x86 and Tegra) the rootfs a/b is disabled and instead we rely on the aforementioned OTA process for redundancy.

To flash we leverage the inbuilt utilities which effectively calls this:

BOARDID=3668 BOARDSKU=0001 FAB=100 BOARDREV= CHIPREV=2 ./tegra194-flash-helper.sh flash_l4t_tegra194_spi_emmc_p3668_avo.xml p3668-nvs960.dtb tegra194-mb1-bct-memcfg-p3668-0001-a00.cfg,tegra194-memcfg-sw-override.cfg 0xB8190000 boot.img rootfs.img "$@"

tegra194-flash-helper.sh I believe is equivalent to flash.sh just repackaged and trimmed down.
flash_l4t_tegra194_spi_emmc_p3668_avo.xml is our flash.xml layout; all that’s modified is we dropped the items in the sdmmc to allow our kernel + rootfs to have full control over the OTA and recovery process. The only thing I could see affecting this would be BOOTCTRL but I believe that was for the rootfs a/b switching and I don’t think the BootRom or MB1 even has access to the emmc to figure out that info and instead rely on the SMD partitions that I mentioned in the original post as well.

The BUP we generate using BUP_generator.py with a patch to odmsign.func to post data to our signing server and that is 100% working as we can install on the active slot and everything boots up fine but is not power-loss tolerant if we take that approach.

During the update on the system we use tegra-bootloader-update and tegra-boot-control to update the slots:

tegra-bootloader-update --slot-suffix "_b" "<path to bup>"
tegra-boot-control --set-active 1

As mentioned in the original post we’ve also tried using nvbootctrl with the same set of arguments (even without a change).

I know this deviates a reasonable amount from what L4T provides off the shelf with ubuntu so we are also willing to run specific command or debug more on our systems if you can give us specific instructions.

Hi @anthony.squires

We verified that the issue cannot be reproduced on JP 4.6.6 using our fused Xavier NX developer kit.
The key point for performing a bootchain switch is enabling the bootloader A/B. This must be enabled on both fused and unfused devices in order to switch the bootchain.

Below are the steps we used to switch the bootchain:

Version 1

  1. Flash the fused device:

    sudo ./flash.sh -u rsa_priv.pem -v sbk.key jetson-xavier-nx-devkit-emmc mmcblk0p1
    
  2. Enable bootloader A/B

    sudo nv_update_engine -e 
    
  3. switch the bootchain:

    sudo nvbootctrl dump-slots-info
    sudo nvbootctrl set-active-boot-slot 1
    sudo nvbootctrl dump-slots-info
    sudo reboot
    

Version 2
Flash the fused device with the ROOTFS_AB=1 configuration.
The bootloader A/B will be enabled automatically when this option is set.

  1. Flash the fused device:

    sudo ROOTFS_AB=1 ./flash.sh -u rsa_priv.pem -v sbk.key jetson-xavier-nx-devkit-emmc mmcblk0p1
    
  2. Switch the bootchain:

    sudo nvbootctrl dump-slots-info
    sudo nvbootctrl set-active-boot-slot 1
    sudo nvbootctrl dump-slots-info
    sudo reboot
    

Please try to reproduce the issue on your Xavier NX developer kit.
If you’re able to reproduce it, kindly share the detailed procedure so we can replicate it on our developer kit and begin investigating the root cause.

Thanks,
David

We have found a solution!

Using the devkit and the process above we were able to interrupt the process more readily than on our carrier and narrow down where the MB1 output disappeared (it was present at the time of running flash.sh) and where the boot chain selection stopped working. It turns out the root cause was missing the soft fuses during the BUP creation which in turn both disables MB1 output as mentioned originally and sets the SMD metadata from ver 5 to ver 3 due to some of the other flags that were unset.

It’s still unclear why this was only the production units were affected but maybe our non-production bup had that section included for whatever reason; regardless that can be an exercise for us internally to figure out.

For future reference anyone using meta-tegra should take caution with the tegra194-flash-helper to ensure the --soft_fuses arg is set for BCT generation:

--- a/tegra194-flash-helper.sh	2025-10-10 07:41:09.827926340 -0700
+++ b/tegra194-flash-helper.sh	2025-10-10 07:42:18.852237836 -0700
@@ -340,7 +340,8 @@
          --scr_config $SCR_CONFIG \
          --scr_cold_boot_config $SCR_COLD_BOOT_CONFIG \
          --br_cmd_config $BR_CMD_CONFIG \
-         --dev_params $DEV_PARAMS"
+         --dev_params $DEV_PARAMS \
+         --soft_fuses tegra194-mb1-soft-fuses-l4t.cfg "
 
 
 if [ $bup_blob -ne 0 -o "$sdcard" = "yes" ]; then

Thank you @DavidDDD for your help.

^^ the patch above would apply to something like the linked meta-tegra script below. That line is already passed to tegraflash script but is apparently position dependent and needs to be with the bct section to actually work

Hi @anthony.squires,

Good to know the issue has been resolved.

Please refer to the previous questions answered by the developer for your reference.

  • What are the differences between fused and unfused units when it comes to switching boot chains?
    • In the officially released BSP, there is no difference for switching boot chains between fused and unfused units.
  • Is it expected the mb1 output is silenced on fused units
    • In the officially released BSP, mb1 output can be seen on both fused and unfused units.
  • Is there any known bug in Jetpack 4 that this could be related to?
    • In the officially released Jetpack 4, no such known bug is related to switching boot chains.

Thanks,
David