[Jetson Thor] JP7.1: A/B slot switching (nvbootctrl) not persisting across reboot on custom carrier board

Hi NVIDIA Team,

We are bringing up a custom carrier board for the Jetson Thor module using JetPack 7.1

We have successfully flashed the device with A/B redundancy enabled, but we are facing an issue where A/B slot switching fails to persist after a reboot.

1. Issue Description

Currently, the system is booted into Slot B.
When we run:

sudo nvbootctrl -t rootfs set-active-boot-slot 0
sudo nvbootctrl -t bootloader set-active-boot-slot 0

The command executes successfully without any errors. However, after executing sudo reboot, the system still boots back into Slot B (Slot 1). The active slot setting is completely lost/ignored.

2. Custom Board Configuration (No EEPROM)

Our custom carrier board is designed without an EEPROM. Following the adaptation guide, we made these changes:

  1. Modified tegra264-mb2-bct-common.dtsi:
    - cvb_eeprom_read_size = <0x100>
    + cvb_eeprom_read_size = <0x0>

  2. Added SKIP_EEPROM_CHECK=1 in our board configuration file (p3834-0008-p4071-0000-nvme.conf).

Flashing Command Used:

sudo ROOTFS_AB=1 ./l4t_initrd_flash.sh \
--external-device nvme0n1 \
-c ./tools/kernel_flash/flash_l4t_t264_nvme_rootfs_ab.xml \
jetson-agx-thor-devkit internal

3. Cross-Validation (Crucial)

To isolate the issue, we flashed the official NVIDIA Thor Developer Kit using the exact same BSP, kernel, device tree, and flashing command.

  • On the DevKit: A/B slot switching works perfectly.

  • On our Custom Board: Stuck in Slot B.

This indicates our software ecosystem and OTA logic are correct, and the issue highly points to hardware design differences or low-level Bootloader behavior specific to our custom board.

4. Bootloader Logs (MB1 Stage)

We captured the serial logs during boot. There are no crash/fallback errors (BR last_boot_error0: 0x0), but the BootROM decides to boot from Slot 1 extremely early:

[0000.091] I> MB1 (version: 0.23.0.2-t264-75019003-378e427f)
[0000.092] C> Boot-mode : Coldboot
[0000.092] C> MB1 last_boot_error: 0x0
...
[0000.111] I> BR last_boot_error0: 0x0
...
[0000.323] I> Boot_device: QSPI_FLASH instance: 0
[0000.345] I> QSPI-0l initialized successfully
[0000.354] C> RAM_CODE 0xc
[0000.356] I> Loading MEMBCT
[0000.359] I> Slot: 1    <-- It forces Slot 1 (B) very early
[0000.361] I> Binary [6] block-66304 (partition size: 0x60000)
...
[0000.620] I> Sku value zero. Using platform data in MB1 BCT
...

he complete bootloader phase log is attached below

007_boot_log_02.txt (98.1 KB)

5. Our Questions:

Since the OS level nvbootctrl command shows success but does not survive a reboot, we suspect the UEFI variables / SMD data are not being correctly flushed to the QSPI Flash on our custom board.

  1. Could missing hardware strapping pins (e.g., QSPI WP/Write Protect pin state) cause nvbootctrl to fail silently when writing to the SMD partition? Which specific pins should we check on the Thor module?

  2. Does the absence of the carrier board EEPROM trigger any deep fallback mechanism in UEFI/MB1 that locks the active slot to a “safe” partition, even with SKIP_EEPROM_CHECK=1 enabled?

  3. Are there any specific commands we can use to verify if the physical write to the QSPI SMD partition is actually happening?

Any guidance on troubleshooting this custom carrier board issue would be greatly appreciated. Thanks!

hello jizhaohui,

may I know what’s the difference between your custom carrier board and Jetson AGX Thor developer kit?
since it’s only happened with customize board, please double check you’ve update l4t_generate_ota_package.sh and ota_board_specs.conf for customize board to create OTA payloads.

hello JerryChang,

Regarding the hardware differences, the main boot-related difference is that our custom carrier board does not have an EEPROM (which is why we set SKIP_EEPROM_CHECK=1 and modified cvb_eeprom_read_size = <0x0>). Other than that, the core SOM and QSPI physical lines are unchanged from the DevKit.

Regarding l4t_generate_ota_package.sh and ota_board_specs.conf, I have not modified them.

My understanding is that these two files only affect the creation of the OTA payload. Do they also somehow affect the initial A/B slot flashing process or the initial SMD partition generation?

Please note that the issue we are facing happens before any OTA update is even attempted. We are currently just verifying the basic A/B slot flashing and manual switching mechanism. We simply boot a freshly flashed device, run nvbootctrl set-active-boot-slot 0, check that it says success, and then reboot. After the reboot, it loses the setting and boots back to Slot B.

Does nvbootctrl manual switching have any dependency on those OTA configuration files?

hello jizhaohui,

let’s have nvbootctrl dump-slots-info to prints information about the slots for checking.

hello JerryChang,

Below are the commands for switching A/B slot and the execution results:



nvidia@localhost:~$ sudo nvbootctrl -t rootfs set-active-boot-slot 0
nvidia@localhost:~$ sudo nvbootctrl -t bootloader set-active-boot-slot 0
nvidia@localhost:~$

nvidia@localhost:~$ sudo nvbootctrl dump-slots-info
Current version: 38.4.0
Capsule update status: 0
Current bootloader slot: B
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal
nvidia@localhost:~$
nvidia@localhost:~$ sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: B
Active rootfs slot: A
num_slots: 2
slot: 0,             retry_count: 3,             status: normal
slot: 1,             retry_count: 3,             status: normal
nvidia@localhost:~$ sudo nvbootctrl -t bootloader dump-slots-info
Current version: 38.4.0
Capsule update status: 0
Current bootloader slot: B
Active bootloader slot: A
num_slots: 2
slot: 0,             status: normal
slot: 1,             status: normal
nvidia@localhost:~$

After reboot, the information obtained by using the nvbootctrl dump-slots-info command is as follows:

nvidia@localhost:~$ sudo nvbootctrl -t rootfs dump-slots-info
Current rootfs slot: B
Active rootfs slot: B
num_slots: 2
slot: 0, retry_count: 3, status: normal
slot: 1, retry_count: 3, status: normal
nvidia@localhost:~$ sudo nvbootctrl -t bootloader dump-slots-info
Current version: 38.4.0
Capsule update status: 0
Current bootloader slot: B
Active bootloader slot: B
num_slots: 2
slot: 0, status: normal
slot: 1, status: normal

Hi @jizhaohui,

Based on your boot log, the system is loading B_* partitions from the very beginning of the boot process, which means the slot is being selected very early (MB1/MB2 stage), before Linux or user-space tools are involved.

According to NVIDIA documentation, BootROM determines the slot differently depending on the type of boot: for cold boot it uses the BR-BCT, while for warm boot it uses the scratch register: Update and Redundancy

This decision is then propagated through MB1/MB2 and later boot stages, so by the time the kernel starts, the slot has already been fixed.

This matches what you are seeing: even though nvbootctrl reports active slot A before reboot, the system always boots from slot B, and the boot log confirms that only B partitions are ever loaded. This indicates that the bootloader is not using the value set by nvbootctrl, but instead is following the slot information from boot metadata (BR-BCT or scratch register).

A useful way to narrow this down further is to compare warm reboot versus full power cycle. After setting slot A with:

sudo nvbootctrl -t rootfs set-active-boot-slot 0
sudo nvbootctrl -t bootloader set-active-boot-slot 0

perform a normal reboot, check the slot after boot, and then repeat the same test with a full power cycle.

If the system switches to slot A only after a warm reboot but returns to B after a cold boot, that would indicate a mismatch between the scratch register and the BR-BCT. If both warm and cold boots always return to slot B, then the bootloader decision is consistently fixed to B, most likely because the boot metadata is not being updated or is being overridden.

In other words, the behavior appears consistent with the bootloader design: the system is not failing to switch, but rather following slot selection data that is determined before nvbootctrl takes effect.

Regards,

Oscar Mendez
Embedded SW Engineer at RidgeRun
Contact us: support@ridgerun.com
Developers wiki: https://developer.ridgerun.com/
Website: www.ridgerun.com

Hi @OscarMendez ,

Thank you for the detailed technical explanation. Your insight regarding the BR-BCT and scratch registers makes a lot of sense.

I followed your suggestion and tested both a warm reboot (sudo reboot) and a full cold boot (power cycle) after running the nvbootctrl commands to set Slot A.
Result: In both scenarios, the system still boots back into Slot B.

Furthermore, there is a very unique and critical anomaly I want to highlight:
Right after a fresh flash of the A/B partitions, this custom board defaults to booting from Slot B immediately. When I use the exact same flash command and payload on the official DevKit, it naturally defaults to Slot A.

Given that both warm/cold boots lock to Slot B, and even a fresh flash defaults to Slot B, it strongly suggests to me that the boot metadata (SMD/BCT) is consistently fixed to B. This leads me to two hypotheses:

  1. Hardware Write-Protect: Is it possible that a hardware write-protect (e.g., WP# pin) on the QSPI flash is preventing the host machine from properly writing the SMD/BCT data during the initial flashing process, causing it to fall back to a default “Slot B” state?

  2. Hardware Strapping: Is there any hardware strapping pin on the Jetson Thor module that dictates the default boot slot upon initialization?

Since the DevKit handles the exact same software perfectly, do you think this points more towards a QSPI physical write issue on our custom carrier board?

Thanks again for your help!

hello jizhaohui,

please test with warm reboot (sudo reboot) after running the nvbootctrl commands to set Slot A,
please also share another complete bootloader logs for reference.

hello JerryChang,

007_custom_board_poweroff_and_restart_01.log (212.2 KB)

007_custom_board_reboot_01.log (215.0 KB)

flash_1-3_0_20260421-165733.log (10.2 MB)

007_custom_board_reboot_01.log is the log after running the nvbootctrl commands to set Slot A and warm reboot.

007_custom_board_poweroff_and_restart_01.log is the log after running the nvbootctrl commands to set Slot A and poweroff.

flash_1-3_0_20260421-165733.log is the log of flashing.

hello jizhaohui,

it looks weird, please double check br-bct, since it’s the ROM/MB1/UEFI pick up the new active slot from BR-BCT/variables and boot it.

hello JerryChang,

To “double check the BR-BCT” as you suggested, what is the recommended way to dump and inspect the physical BR-BCT directly from the QSPI on a running target?
Could you let me know which mtd partition corresponds to the BR-BCT or SMD on Jetson Thor, so I can use dd or hexdump to verify if the slot changes actually made it into the physical non-volatile memory?

Thank you!

Might be worthwhile to build and see if it provides more diagnostic information.

https://github.com/NVIDIA/edk2-nvidia/tree/r38.4-updates

cp edk2-nvidia-r38.4/images/uefi_t26x_general_DEBUG.bin \
   Linux_for_Tegra/bootloader/uefi_t26x_general.bin
   
cd Linux_for_Tegra
sudo ./l4t_initrd_flash.sh --qspi-only jetson-agx-thor-devkit internal

The root cause is hardware design THOR_BC0/1/2 pins have different default voltage level with Devkit.
Issue resolved by correct it,