Previously functional Orin Nano won't boot after failed update, can't be re-flashed

I have a Jetson Orin Nano that is part of a 1/18th scale self-driving car for educational applications, which I bought from a third party.

When I first started it up, it worked fine. A NVIDIA splash screen would appear, and it would boot normally to an Ubuntu 20 desktop environment, which the vendor had customized to include some software specific to the car.

I made significant changes to the installed software while testing, and wanted to “roll back” to the car’s initial state. The vendor recommended that I remove the NVMe storage, plug it into an M.2 reader, and use dd or balenaEtcher to copy the contents of an .img file directly to the NVMe storage.

I did this, and after reinstalling the NVMe card, the Orin would not boot. The NVIDIA splash screen still appeared, but was followed by some error messages from the bootloader (sorry, I did not screenshot these).

Note: the vendor uses the production version of the Orin Nano, I think it is theJetson Orin Nano reference carrier board (P3768-0000) .

I decided to try a clean install of JetPack 6 using the SDK Manager to re-flash the car. I prepared a dedicated Ubuntu 22.04 laptop for this purpose (no WSL, no Docker, clean Ubuntu 22 install) and followed along with the dialogs in the SDK Manager. It was easy to place the Orin Nano into recovery mode and it was recognized by SDK Manager right away. However the flashing process failed around 99%. I tried several times, and also tried manually running the flash.sh script, but the process always failed (sorry, did not save the logs).

At this point, booting the car showed nothing at all. No error messages, no NVIDIA splash screen. I had not wired up the USB-serial converter yet so I don’t have any serial console logs from this part.

Finally I opted to revert to JetPack 5 (the vendor informed me they won’t support Ubuntu 22 anyway, so I need to stick with 20.04). I wiped my laptop, did a clean install, and tried installing 5.1.3. Same problem as before: the car can easily be placed into recovery mode and the flashing process starts without issue, but it always fails around 99%. I tried from the console with flash.sh and it always fails while waiting for the car to reboot. The serial console logs indicate that the car has not successfully booted up. The problem must be very early in the boot process, because the initrd environment used to flash the NVMe card is never started.

Because the SDK manager kept failing, I tried flashing directly with:

./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -p "-c ./bootloader/t186ref/cfg/flash_t234_qspi.xml" -c ./tools/kernel_flash/flash_l4t_t234_nvme.xml --showlogs --network usb0 jetson-orin-nano-devkit nvme0n1p1

But this failed with Waiting for target to boot-up.... Here are the logs (from host CLI and Orin Nano serial terminal):

2024-08-24_11-02_host_terminal_log.txt (276.5 KB)
2024-08-24_11-02_serial_terminal_log.txt (112.3 KB)

Later on, I tried directly re-flashing the QSPI only. This process completed without any error messages, but the Orin Nano still fails to boot or show a splash screen. Here are the host CLI transcript and serial terminal output:

qspi-reflash-host-log.txt (118.2 KB)
qspi-reflash-serial-log.txt (114.2 KB)

From the serial console, it appears that the error is VERY early in the boot process, so early that the UEFI bootloader written to QSPI is not loaded. The last message I see on the serial console is:

ˇ‰E/TC:?? 00 get_rpc_alloc_res:645 RPC allocation failed. Non-secure world result: ret=0xffff0000 ret_origin=0
E/LD:   init_elf:486 sys_open_ta_bin(bc50d971-d4c9-42c4-82cb-343fb7f37896)
E/TC:?? 00 ldelf_init_with_ldelf:131 ldelf failed with res: 0xffff000c
ˇ·

The boot process halts there and nothing further happens. Even the cooling fan appears unable to spin up (and the Orin Nano will start getting quite hot if I don’t shut it off).

What am I doing wrong? Why is the failure so early in the process? Why won’t the UEFI bootloader run?

Thanks so much! I’m at a loss…I’ve been at it for 3 days no with nothing to show for it.

Hi,

Sorry that I just want you to run this command and share me the whole log.

Run this on rel-35.5 and share me the host side and uart log. That’s all.

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 \
  -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" \
  --showlogs --network usb0 jetson-orin-nano-devkit internal

Do not change any parameter in this command.

Also, is this a NV devkit or custom board? Sounds like a custom board?

Hi Wayne,

I will run that command as soon as I get into the office (not at work today) and will share the host + serial console logs.

There are no obvious markings on the carrier board, but I don’t think it’s 3rd party. I think it’s “Jetson Orin Nano 8GB-DRAM (P3767-0003)”, the “production” board listed here.

A couple questions before I run your command.

  1. Doesn’t the “internal” keyword try to flash eMMC or the SD card? My board doesn’t have either of those.

  2. Would it help you to see the .img file? I looked at the partition table inside the .img file, and it certainly “looks” right (A/B boot partitions at the top of the table).

If it’ll help, I will upload the 18 GB file somewhere and share a link (FYI it unpacks to ~120 GB, I guess there’s a lot of empty space in the image).

Just in case it helps, here’s a link to the .img file the vendor asked me to copy to the NVMe card.

Here is the md5 checksum

It is 18 GB compressed, on unpacking it’s ~120 GB.

I’ll run the command you asked me to run as soon as I get a chance.

Hi jeremydavispedersen

Please take a picture of your board. Your answer does not reply anything. P3767-0003 can run on NV devkit and also 3rdparty carrier board.

I feel it is just a custom carrier board because there should not have any “vendor” provide you anything when doing flash if this is NV devkit.

And no need to care about that internal words. This is from official quick start guide document.

https://docs.nvidia.com/jetson/archives/r35.5.0/DeveloperGuide/IN/QuickStart.html

You just need to provide the thing I asked for. No need to provide anything else yet.

If your previous log is correct, your board already booted into UEFI, it is just because UEFI by default does not print any debug log out, make you think it just stopped with no reason. Thus, I don’t see any thing to indicate “it cannot be flashed”.
Trying my flash command and then we will decide the next step according to your log.

I went back into the office this afternoon and ran the command, as you requested:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" --showlogs --network usb0 jetson-orin-nano-devkit internal

Minicom did not generate any output at all. Here’s the log from the terminal window on the Ubuntu 20.04 host:

flash-internal.txt (230.5 KB)

I will provide some picture of the board shortly.

1 Like

Sorry, you can disregard my last message: the board wasn’t powered up. I’m collecting logs again now.

FYI: I have been leaving the recovery mode jumper inserted while running the script. Is that OK?

Here are the photos of the board:





Here are the logs you requested:

host-cli-log.txt (279.2 KB)
minicom-serial-log.txt (39.6 KB)

  1. For the picture, you may need to remove those M.2 devices so that I can see the label under it.

  2. The flash log indicates as what I expected. It boots into UEFI and somehow it got stuck.
    To know what is going on, you need to enable UEFI log by rebuilding UEFI binary with debug version.

Please go to this github,

Download the rel-35.5 version from the tags. Rebuild the UEFI binary out and there would be a release version and debug version binary.
Replace the uefi binary under your BSP with debug version and then do the reflash and dump log again.

Thanks for replying so quickly! You’re really being very helpful. 👍

Here is a photo of the board with the modules removed. Unfortunately there is no silkscreen underneath to tell me who makes the board…it’s all blank:

I’ll try building the debug version of the UEFI tomorrow.

The GitHub page doesn’t seem to have any instructions on what to do with the build artifacts. Do I copy them to Linux_for_Tegra/bootloader/uefi_jetson.bin? Is this the right document to be looking at?

The wiki of the document has this info.

As there is no info printed on the board, it is a custom board.

I built the UEFI this morning, and moved uefi_Jetson_DEBUG.bin from build/nvidia-uefi/images to nvidia/nvidia_sdk/JetPack_5.1.3_Linux_JETSON_ORIN_NANO_TARGETS/Linux_for_Tegra/bootloader. I also renamed the file to uefi_jetson.bin to match the name used by the original UEFI .bin file.

FYI, none of that is documented in the GitHub Wiki. I was able to figure out what to do by following along with the notes on “miniUEFI” here. I realize this page is for rel-36.3 but it’s the only page I could find that makes any mention of what to do with custom UEFI .bin files.

I have re-run the script in the way you specified:

sudo ./tools/kernel_flash/l4t_initrd_flash.sh --external-device nvme0n1p1 -c tools/kernel_flash/flash_l4t_external.xml -p "-c bootloader/t186ref/cfg/flash_t234_qspi.xml" --showlogs --network usb0 jetson-orin-nano-devkit internal

Here are the new logs:

host-termina-log.txt (279.2 KB)
serial-log.txt (89.2 KB)

Turns out I forgot to install the cards (hence the PCIe error messages in the serial console logs above).

I’m re-flashing now and will share new logs shortly.

Here are the new logs:

host-cli-logs.txt (279.1 KB)
serial-console-logs.txt (89.2 KB)

Hi,

We notice your UEFI log indicates this seems to be a NV devkit. But I wonder why it is ES1.

Eeprom Product Id: 699-13768-0000-ES1 A.3

And below log is related to USB driver. Could you try to disable USB part in device tree and see if would work?

add-symbol-file /build/nvidia-uefi/Build/Jetson/DEBUG_GCC5/AARCH64/Silicon/NVIDIA/Drivers/XhciControllerDxe/XhciControllerDxe/DEBUG/XhciControllerDxe.dll 0x239667000
Loading driver at 0x00239666000 EntryPoint=0x0023966CDCC XhciControllerDxe.efi
Deassert pg: 12
Deassert pg: 10
Assert pg: 12
Assert pg: 10
Deassert pg: 12
Deassert pg: 10

Sorry, I’m still very new at this.

How do I disable the USB part in the “device tree”? Where is that? What file(s) do I need to edit?

Yes. I noticed that it reports as as “Dev Kit”, too…and yet it clearly isn’t: there’s no SD card slot anywhere on the board.

  1. When I said “devkit”, what I said is the carrier board you are using is from NV official board.
    NV devkit carrier can work on all Orin Nano/NX. Sdcard slot actually locates on the “module” but not the carrier board. Thus, it is expected you don’t see a sdcard slot on it.

  2. Maybe you could on-hold for a while for us to check more detail. The board ID from your log is actually kind of weird.

Thanks! I appreciate that.

Another quick question: the power supply I was sold is only rated for 12V @ 1A. Is this actually enough to power the Orin Nano? I have not noticed any power problems so far, but I worry this is not enough power.

Hi @jeremydavispedersen

I double checked with our internal team. This issue seems to be your board is p3768-ES1 board and this one is actually a very old engineer sample board. Which is not the common p3768 board in the market now.

That is why it got flash failure.

Ok, so what are my options? Is there a workaround I can try?

The vendor who sold me the Orin Nano (as part of a robotic car kit) has not told me where they bought the board from, and I don’t know if they’d be able to RPA the board or not.

Ideally I would like to get the board back into a working state so that I can return it to the vendor who sold it to me, and directly purchase my own dev kit instead (so I end up with a newer board).