OTA bricks Jetson AGX Xavier (industrial) when writing SMD partitions without any checks

Hi there,

I was Testing OTA, for both non-A/B enabled and A/B enabled devices, and seems that there is a way that could end up with a bricked device, requiring a flash from boot recovery mode,which can only be done from a x86 machine at the moment…

The steps are as follows:

  1. Flash a JAXi with ROOTFS_AB=1 for let’s say R32-5:
    ROOTFS_AB=1 sudo -E --preserve-env=ROOTFS_AB ./flash.sh jetson-agx-xavier-industrial mmcblk0p1
  2. Generate recovery image (I believe this is not used if ROOTFS_AB=1, although it should have been in this case).:
    sudo ./tools/ota_tools/version_upgrade/build_base_recovery_image.sh jetson-agx-xavier-industrial R32-7 ~/src_sen/vcpu-32.5.1/NL4T_LinuxForTegra{,/rootfs} ~/src_sen/vcpu/NL4T_LinuxForTegra
  3. Generate payload package (here let’s omit ROOTFS_AB=1 by mistake):
    sudo ./tools/ota_tools/version_upgrade/l4t_generate_ota_package.sh jetson-agx-xavier-industrial R32-7
  4. Copy across utilities and package:
    scp ota_tools_R32.7.1_aarch64.tbz2 bootloader/jetson-agx-xavier-industrial/ota_payload_package.tar.gz sen@vpm628.local:
  5. Extrack things and start OTA:
    sudo mkdir /ota
    sudo mv ota_payload_package.tar.gz /ota
    mkdir l4t && tar -C l4t -xf ota_tools_R32.7.1_aarch64.tbz2
    cd l4t/tools/ota_tools/version_upgrade
    sudo ./nv_ota_start.sh /dev/mmcblk0 /ota/ota_payload_package.tar.gz
  6. The following happens:
Command: ./nv_ota_start.sh /dev/mmcblk0 /ota/ota_payload_package.tar.gz
init_ota_log /ota_log
Creating log dir at /ota_log
Create log file at /ota_log/ota_20231113-134113.log
OTA_LOG_FILE=/ota_log/ota_20231113-134113.log
Extract /ota/ota_payload_package.tar.gz
update_nv_boot_control_in_rootfs /ota_work
2888-600-0008--1-2-jetson-agx-xavier-industrial-
check_prerequisites
get_chip_id chip_id
decompress_ota_package ota_package.tar /ota_work
decompress_ota_package: start at Mon 13 Nov 13:42:39 GMT 2023
Sha1 checksum for /ota_work/ota_package.tar (72a15c1bc3f2a4d6a1d09e1894298e1f7ba5a818) matches
decompress_ota_package: end at Mon 13 Nov 13:43:32 GMT 2023
nv_ota_update_without_layout_change.sh
Command: nv_ota_update_without_layout_change.sh
check_target_board /ota_work TARGET_BOARD
get_chip_id CHIP_ID
update_utilities_for_BUP_update /ota_work
enable_a_b_redundancy
get_rootfs_a_b_enabled ROOTFS_AB_ENABLED UNIFIED_AB_ENABLED
ROOTFS_AB_ENABLED=1
UNIFIED_AB_ENABLED=1
get_update_control /ota_work UPDATE_BOOTLOADER UPDATE_ROOTFS
UPDATE_BOOTLOADER=1, UPDATE_ROOTFS=1
update_bootloader /ota_work 1 0x19
get_update_slot bootloader 1 update_slot
update_slot=A
update_bootloader_with_UE /ota_work 0x19
Remove exsting /opt/ota_package/entry_table
Nvidia A/B-Redundancy Update tool Version 2.1
verifying update with unified a/b enabled
Verify bootloader update begins.
The rotate count has been restored.
The current slot 1 is marked as boot successful
SM: S1
The priority of current slot 1 has been restored.
Nvidia A/B-Redundancy Update tool Version 2.1
Got bl payload file: /ota_work/bl_only_payload
current slot 1
SM: S11
Set slot 0 as unbootable and start updating.
Start running: /usr/sbin/nv_bootloader_payload_updater --no-dependent-partition /ota_work/bl_only_payload
Start running: /opt/nvidia/l4t-bootloader-config/nv-l4t-bootloader-config.sh -c
2888-600-0008--1-2-jetson-agx-xavier-industrial-
Got update payload: /ota_work/bl_only_payload
Tegra User Block Device: /dev/disk/by-partlabel
Tegra Boot Block Device: /dev/mtdblock0
HEADER: MAGIC NVIDIA__BLOB__V2
HEX_VALUE 16909857
BLOB_SIZE 6925445
HEADER_SIZE 48
NUMBER_OF_ELEMENTS 19
HEADER_TYPE 0
UNCOMP_SIZE 6925445
MB1_RATCHET_LV 0
MTS_RATCHET_LV 0
ROLLBACK_FUSE_LV 0
Device TN Spec: 2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1
Device Compatible Spec: 2888-600-0008--1-2-jetson-agx-xavier-industrial-
Device TN Spec: 2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1
Device Compatible Spec: 2888-600-0008--1-2-jetson-agx-xavier-industrial-
Device is fused board.
ENTRY_TABLE:
PART  POS  LEN  VER TNSPEC TYPE UPDATABLE
spe-fw  2328  94960  12913    0  1
mb2  97288  181328  12913    0  1
cpu-bootloader  278616  471008  12913    0  1
secure-os  749624  410560  12913    0  1
bpmp-fw  1160184  856352  12913    0  1
eks  2016536  5136  12913    0  1
adsp-fw  2021672  81312  12913    0  1
rce-fw  2102984  271904  12913    0  1
mts-preboot  2374888  24016  12913    0  1
mts-mce  2398904  143200  12913    0  1
mts-proper  2542104  3430416  12913    0  1
sc7  5972520  65504  12913    0  1
bpmp-fw-dtb  6038024  182496  12913  2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1  0  1
bootloader-dtb  6220520  221920  12913  2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1  0  1
VER  6442440  101  12913  2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1  0  1
mb1  6442541  250432  12913  2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1  2  1
BCT  6692973  2888  12913  2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1  2  1
MB1_BCT  6695861  30928  12913  2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1  0  1
MEM_BCT  6726789  198656  12913  2888-600-0008-H.0-1-2-jetson-agx-xavier-industrial-mmcblk0p1  0  1
Saving Entry table to /opt/ota_package/entry_table
spe-fw write: slot = 0 offset = 2883584 bytes = 94960
mb2 write: slot = 0 offset = 3407872 bytes = 181328
cpu-bootloader write: slot = 0 offset = 14942208 bytes = 471008
secure-os write: slot = 0 offset = 19660800 bytes = 410560
bpmp-fw write: slot = 0 offset = 31719424 bytes = 856352
eks write: slot = 0 offset = 24903680 bytes = 5136
adsp-fw write: slot = 0 offset = 25427968 bytes = 81312
rce-fw write: slot = 0 offset = 27525120 bytes = 271904
mts-preboot write: slot = 0 offset = 3932160 bytes = 24016
mts-mce write: slot = 0 offset = 4456448 bytes = 143200
mts-proper write: slot = 0 offset = 4980736 bytes = 3430416
sc7 write: slot = 0 offset = 13369344 bytes = 65504
bpmp-fw-dtb write: slot = 0 offset = 34865152 bytes = 182496
bootloader-dtb write: slot = 0 offset = 18087936 bytes = 221920
VER write: slot = 0 offset = 37486592 bytes = 101
MB1_BCT write: slot = 0 offset = 786432 bytes = 30928
MB1_BCT write: slot = 0 offset = 851968 bytes = 30928
MB1_BCT write: slot = 0 offset = 983040 bytes = 30928
MB1_BCT write: slot = 0 offset = 1179648 bytes = 30928
MEM_BCT write: slot = 0 offset = 2359296 bytes = 198656
BCT slot = 0 write: offset = 16384 bytes = 2888
BCT slot = 0 write: offset = 0 bytes = 2888
BCT slot = 1 write: offset = 32768 bytes = 2888
BCT slot = 1 write: offset = 65536 bytes = 2888
BCT slot = 1 write: offset = 98304 bytes = 2888
BCT slot = 1 write: offset = 131072 bytes = 2888
BCT slot = 1 write: offset = 163840 bytes = 2888
BCT slot = 1 write: offset = 196608 bytes = 2888
BCT slot = 1 write: offset = 229376 bytes = 2888
Update bup successfully
SM: S12
Setting slot 0 as active boot slot
Unified bootloader and rootfs a/b is enabled, please reboot manually!
Nvidia A/B-Redundancy Update tool Version 2.1
SMD base part CRC invalid
Primary SMD is corrupted!
SMD base part CRC invalid
Secondary SMD is corrupted!
SMD base part CRC invalid
Primary SMD is corrupted!
SMD base part CRC invalid
Secondary SMD is corrupted!
SMD base part CRC invalid
Primary SMD is corrupted!
SMD base part CRC invalid
Secondary SMD is corrupted!
Got payload file: /ota_work/xusb_only_payload
SMD base part CRC invalid
Primary SMD is corrupted!
SMD base part CRC invalid
Secondary SMD is corrupted!
A/B has been disabled. Need to enable A/B.
Error: installing bootloader updates failed: -22
Failed to run "/usr/sbin/nv_update_engine -i bl-only --payload /ota_work/xusb_only_payload --no-reboot"
Failed to run "update_bootloader_with_UE /ota_work 0x19"
Failed to run "update_bootloader /ota_work 1 0x19"
  1. Try to reboot:
    sudo reboot
  2. Following happens:
[ 2686.467069] watchdog: watchdog0: watchdog did not stop!
[ 2686.484394] systemd-shutdow: 42 output lines suppressed due to ratelimiting
[ 2686.743906] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 2686.751294] reboot: Restarting system
����Shutdown state requested 1
Rebooting system ...
WARNING: at platform/drivers/cchet value 4 is too large than ratchet level 2 from HW fuses.
[0000.034] I> MB1 (prd-version: 1.5.1.9-t194-41334769-73a9b7ef)
[0000.039] I> Boot-mode: Coldboot
[0000.042] I> Chip revision : A02P
[0000.045] I> Bootrom patch version : 15 (correctly patched)
[0000.050] I> ATE fuse revision : 0x200
[0000.054] I> Ram repair fuse : 0x0
[0000.057] I> Ram Code : 0x1
[0000.059] I> rst_source : 0xb
[0000.062] I> rst_level : 0x1
[0000.066] I> Boot-device: QSPI
[0000.068] I> Qspi flash params source = brbct
[0000.072] I> Qspi using bpmp-dma
[0000.075] I> Qspi clock source : pllp
[0000.079] I> QSPI Flash Size = 64 MB
[0000.082] I> Qspi initialized successfully
[0000.086] E> LOADER: Failed to verify SMD.
[0000.090] I> Primary SMD copy is invalid, try with secondary copy..
[0000.096] E> LOADER: Failed to verify SMD.
[0000.100] E> LOADER: Failed to verify SMD & SMD_b.
[0000.104] E> Heap corrupted !!!

If I set the ROOTFS_AB=1 when generating the OTA, then it works, but without it, I would expect either a layout change or the SMD contents to be checked before writing them, or otherwise resored on error…

Let me know if there is a way to recover from this, other then testing all in a twin system…

hello david.fernandez,

is it a must to stay on rel-32?
Image-based OTA with Rootfs A/B is supported. you may refer to the steps as mentioned below for reference.
~/nvidia/nvidia_sdk/JetPack_5.1.2_Linux_JETSON_AGX_XAVIER_TARGETS/Linux_for_Tegra/tools/ota_tools/version_upgrade/Image_based_OTA_Examples.txt

At the moment, FRAMOS won’t release a version of the AR1335 sensor driver for the 35 versions, so I need to stay on that until they do.

I know image based OTA is supported… I just mentioning here that if somebody accidentally omits the ROOTFS_AB=1 when generating the OTA package, attempting that OTA appears to brick the Jetson.

I tried the nv_update_engine in several ways to attempt to restore the SMD to something functional, and also nvbootctrl, but I couldn’t make them do it… maybe a --force option would be good.

So I was wondering if you know a way to recover from that after you get all those CRC invalid messages.

hello david.fernandez,

be careful to generate payload package, it uses different layouts with/without ROOTFS_AB.

you may have physical setup to re-flash the target via USB ports.

Sure, but I was hoping that there would be some checks that prevent going ahead with a wrong OTA that will brick the system.

We don’t have those USB setups that allow reflashing via USB on orbit, as there is no x86 computer in the payload, and the tegraflash utility seems compiled for x86 32-bit, which makes it difficult to run it even in qemu.

So it would be good to run some sort of ARM utility to restore the partitions directly on the Jetson, or add some checks to the OTA scripts to avoid this problem…

Wonder if you could help with any of those.

Hi david.fernandez,

The Image-Based OTA update on AGX Xavier Industrial is not support from r32 to r32.
Only support from r32 to r35 and start support from r32.7.1.

From your first comment, you also need set the environment variable (TARGET_BSP, BASE_BSP), please follow document steps from: https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3274/index.html#page/Tegra%20Linux%20Driver%20Package%20Development%20Guide/updating_jetson_and_host.html#wwpID0E0QI0HA

Suggest you can full flash and make your device resume normal first.
$ sudo ./flash.sh jetson-agx-xavier-industrial mmcblk0p1

According to this: NVIDIA Jetson Linux Developer Guide : Over-the-Air Update | NVIDIA Docs

The table shows that updates from 32.5.1 to 32.7.1 are supported.

I tested them and they work.

The variables TARGET_BSP and BASE_BSP seem to be just simple helpers for people… the OTA scripts do not use them AFAIK, and the command line that I provided does the same as indicated in the flow.

And again, I know flashing from an x86 machine works, but my use case is updating a satellite where no x86 machine is orbiting around the payload to help with that.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.