Fused Orin NX R36.4.4 -> R36.5 bootloader-only OTA capsule hard-hangs before Linux

We are debugging a secure/fused Jetson Orin NX product on a custom carrier. The
unit full-flashes successfully through APX/initrd and writes QSPI/rootfs
normally. Same-BSP secure R36.5 → R36.5 full OTA with a bootloader capsule also
succeeds. The remaining failure is specifically the cross-BSP secure bootloader
move from Jetson Linux R36.4.4 to R36.5.0.

We understand custom-carrier support boundaries. The reason we are asking here
is that the same hardware full-flashes, boots both BSP rootfs states, and accepts
same-BSP secure capsules; the failure appears only when R36.4.4 firmware
processes an R36.5 bootloader payload.

The strongest reproduction uses NVIDIA OTA tooling directly:

Clean R36.4.4 secure bootloader/QSPI and R36.4.4 rootfs, A/A slots
NVIDIA image-based OTA bootloader-only payload generated for R36.4.4 → R36.5
nv_ota_start.sh stages TEGRA_BL.Cap successfully
reboot hard-hangs before Linux/USB/APX

Primary question:

Is an R36.4.4 UEFI/FMP path expected to consume an R36.5.0 bootloader-only
capsule/BUP on a fused Orin NX from a clean R36.4.4 system, or must
R36.4.4 → R36.5.0 be performed as a full NVIDIA image-based OTA transaction
with rootfs and bootloader movement owned together by NVIDIA OTA tooling?

Context:

Module: Jetson Orin NX / P3767-0000 class
Carrier: custom carrier
Security: fused secure boot, PKC/SBK boot-chain signing, UEFI capsule auth
Base BSP: Jetson Linux R36.4.4
Target BSP: Jetson Linux R36.5.0
Storage: external NVMe, rootfs A/B, encrypted rootfs, encrypted data/UDA,
read-only rootfs
OTA tooling in the discriminator: NVIDIA ota_tools_r36.5.0_aarch64.tbz2,
nv_ota_start.sh, bootloader-only ota_payload_package.tar.gz

Known-good controls:

  1. Secure APX/initrd full flash works repeatedly on this exact hardware and
    returns the unit to R36.4.4 with clean ESRT.
  2. The direct official OTA failure below starts from clean R36.4.4 rootfs and
    R36.4.4 bootloader, both A/A, with clean ESRT.
  3. Same-BSP secure R36.5 → R36.5 full OTA with a bootloader capsule works on
    the same device family. The target rootfs and bootloader slot boot, ESRT
    reports success, and the update completes.
  4. Non-secure full OTA with capsule is proven separately.
  5. Updating ESP contents to R36.5 is not sufficient. R36.4.4 firmware can boot
    normally with R36.5 ESP contents when no capsule is staged, but still hangs
    when the R36.5 bootloader capsule is staged.
  6. NVIDIA’s DA-12672 Orin QSPI MB1-BCT overlay was checked against the current
    R36.5 generated inputs and payloads; the relevant trimmer2-val setting is
    already present.

Official OTA discriminator:

Direct clean-R36.4.4 negative control:
A p3509-native BUP was not selected at first because R36.4.4 reported the
generic compatible spec:

  COMPATIBLE_SPEC=3767-000-0000--1--p3509-a02-p3767-0000-

The payload had exact p3509 entries, so nv_ota_start.sh stopped before
staging with:

  No image is found for compatible SPEC 3767-000-0000-

Direct clean-R36.4.4 important failure:
We reran with the compatible spec overridden to the exact p3509 target:

  3767-301-0000-G.1-1-0-p3509-a02-p3767-0000-nvme0n1p1

nv_ota_start.sh then selected the BUP, returned success, staged:

  /boot/efi/EFI/UpdateCapsule/TEGRA_BL.Cap size=11069002

and rebooted. The device hard-hung for the full monitor window before
Linux/USB/APX. A normal APX-off power cycle also did not recover it. Recovery
required forced APX and full flash.

Earlier post-rootfs-hop negative control:
We first staged a payload whose wrapper board_name was adjusted to
p3509-a02-p3767-0000, but whose BUP table still contained
jetson-orin-nano-devkit entries. nv_ota_start.sh staged the capsule and
rebooted. Linux returned, bootloader stayed R36.4.4, and ESRT showed:

  last_attempt_status=6151
  last_attempt_version=0

We treat that as a clean bad-target rejection.

Earlier post-rootfs-hop failure:
We then generated a p3509-native bootloader-only BUP whose entries are
tagged:

  3767-301-0000-G.1-1-0-p3509-a02-p3767-0000-

The capsule wrapper metadata was:

  FwVersion=0x00240500
  LowestSupportedVersion=0x00240404

Payload identifiers:

  ota_payload_package.tar.gz sha256=9f1bb6426788b5e70cbb2c8407afe9cd33cb8fbd8921035b6daeaa9bb3575804
  TEGRA_BL.Cap sha256=b00f9cf86ea6c5ff24ab1589dd1dd37d77c81bd2774d9aa5b59c46271fc6e8bc
  TEGRA_BL.Cap size=11069002

nv_ota_start.sh returned success and staged the capsule under the ESP update
capsule directory. On reboot the device hard-hung before Linux/USB/APX. We
recovered only by forcing APX and performing a full flash. The later
clean-R36.4.4 result above shows this is not only a mixed-rootfs-state
problem.

Pre-reboot state captured on the equivalent real update path:

OsIndications=0x4 was set with efivar -w
BootChainFwNext was absent
BootChainFwStatus was absent
ESRT before reboot was clean:
fw_version=2360324
lowest_supported_fw_version=2360324
last_attempt_version=0
last_attempt_status=0

What we have already ruled out:

Rootfs extraction/write failure: rootfs-only secure update works and completes.
Generic A/B rollback issue: same-BSP secure OTA with capsule works.
Capsule wrapper auth as the only issue: same-BSP secure capsule works with the
trusted wrapper path; cross-BSP still fails.
ESP freshness as the only issue: both ESP partitions were updated to R36.5 and
no-capsule boot worked, but cross-BSP capsule still hung.
FmpCapsuleSinglePartitionChain=0 as a quick fix: still hung.
Missing DA-12672 QSPI MB1-BCT overlay: audited and present.
Stale BootChainFwNext: absent before the failing reboot.

Focused questions:

  1. Is R36.4.4 UEFI/FMP expected to apply an R36.5.0 bootloader-only capsule/BUP
    on fused Orin NX from a clean R36.4.4 system, or is the supported
    R36.4.4 → R36.5.0 path only a full NVIDIA image-based OTA transaction?
  2. Should R36.4.4 report the generic compatible spec above on this module, and
    should a valid R36.4.4 → R36.5 p3509 payload use generic or exact BUP
    entries for matching?
  3. Are there required BR-BCT, ratchet, rollback, or BootChain variables beyond
    staging TEGRA_BL.Cap and setting OsIndications=0x4?
  4. Is overriding COMPATIBLE_SPEC to the exact p3509 target a valid diagnostic,
    or does that bypass an important NVIDIA compatibility guard?
  5. What firmware logs or UEFI debug switches can capture failures before Linux
    when UART capture is empty and the unit does not return to ESRT?
  6. Is AutoUpdateBrBct required for this R36.4.4 → R36.5 secure transition?
    If yes, how should it be set when it appears boot-service/nonvolatile only
    and cannot be written from Linux runtime with efivar?
  7. What exactly does ESRT last_attempt_status=6151 mean for Jetson FMP in
    R36.4.4 when the BUP target is wrong?
  8. Is /dev/mtdblock0 expected to be available from Linux on secure Orin NX
    R36.5 for nv_bootloader_payload_updater, or is that helper not intended for
    this production layout?
  9. For bootloader-only l4t_generate_ota_package.sh -b, which inputs are
    mandatory on external NVMe + rootfs A/B + encrypted rootfs/UDA + UEFI secure
    boot? Should bootloader-only mode still include rootfs UUIDs, UDA UUID,
    disk encryption key, ROOTFS_AB=1, ROOTFS_ENC=1, --uefi-keys, --uefi-enc,
    and exact custom-board spec entries?
  10. If NVIDIA image-based OTA must own rootfs and bootloader movement together,
    what is the recommended migration shape for a production fleet currently
    on R36.4.4 with an existing rootfs A/B update client?

We can provide redacted l4t_generate_ota_package.sh logs, BUP table summaries,
FMP metadata, nv_ota_start.sh logs, pre-reboot ESRT/nvbootctrl/efivars,
post-recovery ESRT/slot state, and full-flash success logs.

hello 997paul,

may I double check.. p3509 is the name of the Jetson Xavier NX Developer Kit.
please aware there’s a footnote according to Jetson FAQ | NVIDIA Developer
let me re-cap as below..

†† Jetson Orin NX & Jetson Orin Nano series modules are not pin-compatible with Jetson Xavier NX series modules, but you can design a carrier board for the I/Os they have in common, such that both modules are supported.

it’s suggest to update l4t_generate_ota_package.sh and ota_board_specs.conf for customize board to create OTA payloads.
see-also Topic 332980 for an example, please update CUSTOMIZE_JETSON , and customize-jetson to your board name accordingly.

last_attempt_status=6151 that case looks like BUP target/image mismatch.
please confirm whether the OTA package contains entries matching the device’s actual /etc/nv_boot_control.conf TNSPEC and COMPATIBLE_SPEC, and provide the BUP table/image list.

BTW,
did you also test the full NVIDIA image-based OTA flow instead of bootloader-only update?

Thanks Jerry.

On p3509: understood. In our case the name comes from the NVIDIA p3509-a02+p3767-0000 Orin NX config path we derived from. The module is Orin NX / P3767 on a custom carrier, so I agree the custom-board ota_board_specs.conf path is the right area to audit.

From a clean R36.4.4 recovery flash, the device reports:

/etc/nv_boot_control.conf:
TNSPEC 3767-301-0000-G.1-1-0-p3509-a02-p3767-0000-nvme0n1p1
COMPATIBLE_SPEC 3767-000-0000–1–p3509-a02-p3767-0000-

UEFI efivars:
TegraPlatformSpec=3767-301-0000-G.1-1-0-p3509-a02-p3767-0000-
TegraPlatformCompatSpec=3767-000-0000–1–p3509-a02-p3767-0000-

EEPROM/device-tree:
partnumber[nvidia]: 699-13767-0000-301 G.1
/proc/device-tree/chosen/ids: 3767-0000-301
/proc/device-tree/chosen/nvidia,sku: 699-13767-0000-301 G.1

So fab 301 is real device state. EEPROM, device tree, and the UEFI efivars agree. The host-side override only made payload selection agree with that device state; the anomaly is that COMPATIBLE_SPEC generation gives the generic 3767-000-0000- string.

Agreed on 6151: that payload had non-matching devkit entries and returned cleanly to Linux. I am treating that as a bad-target BUP case, not the main failure.

The main case I am trying to understand is different. A p3509-native R36.5 bootloader-only payload has:

FwVersion=0x00240500
LowestSupportedVersion=0x00240404
30 BUP entries
entries carry either exact TNSPEC 3767-301-0000-G.1-1-0-p3509-a02-p3767-0000- or blank TNSPEC
BCT-boot-chain_backup present, update_mode=2, size=32768

Example entries:
BCT, BCT_A, BCT_B, mb1: update_mode=2
MB1_BCT, MEM_BCT, mb2, cpu-bootloader: update_mode=0
BCT-boot-chain_backup: update_mode=2

This p3509-native bootloader-only payload stages successfully. On the capsule reboot, the unit does not reach Linux or USB gadget. Recovery requires forced RCM/APX and a full reflash.

Controls that may help narrow this down:

  • Secure APX/initrd full flash works on this hardware.
  • Same-BSP R36.5 → R36.5 secure full OTA with a bootloader capsule works on this unit/class: target rootfs and bootloader slot boot, ESRT reports success, and the update completes.
  • A same-version R36.4.4 control capsule has returned ESRT success.
  • Updating the ESP contents to R36.5 is not sufficient by itself. R36.4.4 firmware can boot normally with R36.5 ESP contents when no capsule is staged, but still hangs when the R36.5 bootloader capsule is staged.
  • A BUP-table comparison across the passing R36.4.4 control, a R36.4.4-era failing diagnostic, the current R36.5 payload, and the p3509-native official R36.5 payload shows that table shape alone does not explain the outcome.

We also checked the full NVIDIA image-based OTA path. A full-system p3509 package with update_control containing both bootloader and rootfs was generated. It only selected the p3509 images when the target compatible spec was overridden to the exact device TNSPEC. Without that override, the device’s generic COMPATIBLE_SPEC selected 3767-000-0000- and no image was found.

That full-system flow did not reach capsule handoff on our current partition layout. The stock updater expects the inactive encrypted rootfs as APP_ENC_b plus separate boot partition handling. This layout has APP/APP_b as ext4 boot/container partitions, with the encrypted rootfs as a loop-LUKS file inside the container. I did not force the stock updater through that mismatch because it would no longer be a clean stock NVIDIA image-based OTA test.

What I am trying to pin down is:

  1. Is bootloader-only R36.4.4 → R36.5 capsule update expected to work on a fused Orin NX with PKC/SBK signing and UEFI capsule authentication enabled, when the BUP table matches the live TegraPlatformSpec? If yes, what preconditions should we verify before staging it?

  2. If bootloader-only cross-release update is not a supported path, what is the expected supported path for a product layout that cannot use the stock APP_ENC/APP_ENC_b rootfs updater directly? Is the intended approach to provide a custom rootfs updater while preserving the standard NVIDIA bootloader capsule handoff?

  3. For custom-board BUP matching, should ota_board_specs.conf generate entries for the exact TegraPlatformSpec, the generic TegraPlatformCompatSpec, or both? And is a generic COMPATIBLE_SPEC of 3767-000-0000–1–… expected on a fab-301 module?

I can provide redacted BUP tables and nv_ota_start logs if useful.