How can i change unbootable part to normal (AB=1)

Hello NV experts:

I am currently encountering an issue. After enabling the AB partitions and performing multiple OTAs, I found that one of the partition status is marked as unbootable. How can I set the status of this partition back to Normal?

run@agi:~$ sudo nvbootctrl dump-slots-info
Current version: 36.4.3
Capsule update status: 0
Current bootloader slot: B
Active bootloader slot: B
num_slots: 2
slot: 0, status: unbootable
slot: 1, status: normal

B&R

Tao

hello 15642339119,

may I know what’s your steps to reproduce this issue? did you have image-based OTA or capsule update?
besides.. nvbootctrl by default to show the bootloader slots info.
per your comments.. After enabling the AB partitions , you’ve enable ROOTFS_AB=1, right?

hi Jerry:

We enabled the ROOTFS_AB=1 configuration and performed several OTA updates. Suddenly, we found that we could not switch to the target partition anymore. Checking the information with nvbootctl showed it as “unbootable.” How can I quickly clear this status to perform another OTA update? The faulty device is at the customer site, and we want to restore it to a normal state before upgrading again. Are there any commands available for this?

B&R

Tao

hello 15642339119,

you may try Manually Trigger the Capsule Update to update the abnormal slots.

hi Jerry:

In my OTA update package, I have configured it to upgrade both the rootfs and bootloader simultaneously. However, even after multiple OTA attempts, this status still cannot be cleared.

hello 15642339119,

we may also check your steps, please share your commands for reference.

hi Jerry:

Can i modify some vars in /sys/firmware/efi/efivars/ to clean this unbootable flag?

hello 15642339119,

please try to use the nvbootctrl command to mark the unbootable partition as bootable.
for instance, you can try.. $ sudo nvbootctrl mark-bootable 0, to mark slot 0 as bootable.

hi Jerry:

create ota package:

sudo ROOTFS_AB=1 EXT_NUM_SECTORS=268435456 \ ./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh \
–external-device nvme0n1 -S 27GiB jetson-orin-nano-devkit R36-4

ota:

sudo ./nv_ota_start.sh "${new_apk}"

hi Jerry:

It seams like nvbootctl doesn’t support mark-bootable options:

$ sudo nvbootctrl mark-bootable 0
nvbootctrl - command-line wrapper for the boot_control HAL.

Usage:
nvbootctrl [Options] Command

Options:
-t - target available: bootloader or rootfs. Default: bootloader.

Commands:
get-number-slots - Prints number of slots.
get-current-slot - Prints currently running SLOT.
set-active-boot-slot SLOT - On next boot, load and execute SLOT.
dump-slots-info - Prints info for slots.
verify - Verify the bootloader and rootfs boot.
verify-bl - Verify the bootloader boot only.
is-rootfs-ab-enabled - Rootfs only. Return 0 if rootfs A/B is disabled;

  • Return 1 if rootfs A/B is enabled, current slot is A;
  • Return 2 if rootfs A/B is enabled, current slot is B;

SLOT parameter is the zero-based slot-number.

hello 15642339119,

did you meant you start OTA from r36.4 and then it’ll crash slot-A? (i.e. slot: 0, status: unbootable)
or.. what exactly of performing multiple OTAs?

hello 15642339119,

BTW, you may try entering UEFI menu to update slot status.
for instance,
Device Manager → NVIDIA Configuration → L4T Configuration → OS chain A status → change to Normal

hi Jerry:

I was just making routine changes to the rootfs, then building an OTA package every few days or once a week and flashing it to the board as usual. All of a sudden, after one particular update, the board failed to boot—the partition had become unbootable.

hi Jerry:

Can I fix it without going into UEFI? The board has already been assembled and shipped to the customer, so we have no way to reach the serial console. I’d really like to complete the repair entirely from within Linux—just run a few commands and be done.

hello 15642339119,

we may need to understand the error logs for the root cause.
for instance, if you’re missing binary file, it’ll not trigger updating process.
it shall report following error logs from target side.

FmpTegraCheckImage: Missing required image for partition mb1: Not Found
FmpDxe(NVIDIA System Firmware): CheckTheImage() - FmpDeviceLib CheckImage failed. Status = Aborted
FmpDxe(NVIDIA System Firmware): SetTheImage() - Check The Image failed with Aborted.

unfortunately, there’s no single commands to reset slot info.
you’ll need to identify the root cause, updating binary file to address failure, and then entering UEFI to reset the slot info.

hi jerry:

I’ll arrange to have the defective board retrieved for lab debugging. Once I have it, I’ll upload the logs so we can pinpoint the root cause.

hi Jerry:

I’ve already obtained the problematic board. I attempted to re-OTA it, and the upgrade completed without any errors. Attached are the OTA upgrade log and the serial-port log captured after the successful upgrade and reboot. Please help me figure out why the “unbootable” status still can’t be cleared.

uartlog-after-ota.txt (1007.3 KB)

ota-log.txt (119.8 KB)

B&R

hello tangqaq,

here’s MB1 logs for trying to boot from slot-A.

[0000.318] I> Task: Load membct                                                                                                                                                          
[0000.321] I> RAM_CODE 0x4000411                                                                                                                                                         
[0000.324] I> Loading MEMBCT                                                                                                                                                             
[0000.327] I> Slot: 0                                                         

here’s error logs from MB2

I> Task: Ratchet update                                                                                                                                                                  
W> Skip ratchet update - OPTIN fuse not set                                                                                                                                              
I> Task: Prepare eeprom data                                                                                                                                                             
E> I2C: slave not found in slaves.                                                                                                                                                       
E> I2C: Could not write 0 bytes to slave: 0x00ae with repeat start true.                                                                                                                 
E> I2C_DEV: Failed to send register address 0x00000000.                                                                                                                                  
E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xae at 0x00000000 via instance 0.                                                                                         
E> eeprom: Retry to read I2C slave device.                                                                                                                                               
E> I2C: slave not found in slaves.                                                                                                                                                       
E> I2C: Could not write 0 bytes to slave: 0x00ae with repeat start true.                                                                                                                 
E> I2C_DEV: Failed to send register address 0x00000000.                                                                                                                                  
E> I2C_DEV: Could not read 256 registers of size 1 from slave 0xae at 0x00000000 via instance 0.                                                                                         
E> eeprom: Failed to read I2C slave device                                                                                                                                               
C> Task 0x0 failed (err: 0x1f1e050d)                                                                                                                                                     
E> Top caller module: I2C_DEV, error module: I2C, reason: 0x0d, aux_info: 0x05                                                                                                           
I> Busy Spin                                                                                                                                                                             
��                                                                                                           

due to the failure, it’s fallback to slot-B and marking slot-A as unbootable.
are you working with a customized carrier board? it’s the error reported regrading to eeprom,
please check developer guide, MB2 configuration changes.
you may have the following modification in MB2-BCT file (i.e. tegra234-mb2-bct-common.dtsi) when the carrier board is designed without an EEPROM.

- cvb_eeprom_read_size = <0x100>
+ cvb_eeprom_read_size = <0x0>

hi Jerry:

I’m using a customized carrier board with an Orin NX 16 GB module. Out of several thousand units shipped, only one has shown this EEPROM-related error. Can I conclude this is a hardware issue?

Under our current configuration we have already changed it to 0.:

ya, it looks like hardware issue.