Bootloader dropping to shell or network boot after flashing a backed up image

After cloning an Orin using the tools/backupandrestore scripts the cloned image boots to the shell or tries to boot from the network taking 5 min to actually boot from the NVME.
The image was cloned using this command:
sudo ./tools/backup_restore/l4t_backup_restore.sh -e nvme0n1 -b custom-board-conf

And restored with:
sudo ./tools/backup_restore/l4t_backup_restore.sh -e nvme0n1 -r custom-board-conf

The goal is to not have to manually go into the boot configuration to change the boot order or delete the other options.

We have tried modifiying the bootloader using the UEFI adaption instructions by removing the other devices from the boot options but they again come back after cloning, also changing the add new devices to top and bottom and we get similar issues either it drops to the shell or puts network on top.

for development the boot order can be changed manually but we are starting to go into production and need this to work automatically. Thanks for any help

Do you have Ethernet cables plugged in?
Unplug them if you do.
Or remove network boot stack from UEFI source code:

Thank you it’s not so simple the Ethernet is part of the board design and is internal to the pcb cannot be unplugged.
The issue was posted as requested after a short discussion with the Jetson team at GTC today. Some changes to the backup scripts might be rewuir d and some ideas about how to solve the issue were discussed. Hopefully this issue can be used for documenting this.

This is not related to the backup script.

Either try the method to remove network stack from UEFI source code as I mentioned earlier,
or you can make UEFI put newly detected devices on bottom of the boot order:
https://docs.nvidia.com/jetson/archives/r35.5.0/DeveloperGuide/SD/Bootloader/UEFI.html#customizing-the-default-boot-mode-in-the-configuration-file
Follow the similar way mentioned here, and change the value of NewDeviceHierarchy from 01 to 00.

This does not work because the ethernet interfaces are added before the NVME so it still tries to boot from the network resulting in a delay of more than 5min.

You mean this one does not work?

Or this one?

I tried both, neither work, changing the NewDeviceHieracy either drops me to a shell, or puts the network devices on top.
Removing the the network devices from the UEFI code still adds them on first boot. Maybe Im not doing it right? for us probably just leaving the NVME as the sole boot source would work. However the conversations at GTC seemed to indicate that when flashing the standard jetpack image the NVME is correctly selected as the boot source on first boot so it was suggested that when doing the backup and restore there might be some variables in the QSPI image that may need to be deleted so that its not expecting the specific NVME it was backed up from and instead behaves the same was it does when flashing the standard jetpack image.

How did you flash it?

@Jaime_M – your initial post talks only about backing up the NVME. Are you backing up and restoring the QSPI too? Or are you using a “QSPI-only” flash config to program the QSPI and then restoring only the NVME?

I haven’t heard back from you but I have reproduced your issue and have a potential solution for you. First, the reason why you see this behavior
 When a board is first booted, UEFI records the hardware configuration inside the uefi_variables partition. This includes things like MAC addresses, etc. This is so UEFI knows from one boot to another whether any new devices have been added. And in the case of new devices being added, by default those new devices are added to the top of the boot order (e.g. when you plug in a USB drive to reflash, etc.). In your case the MAC address of the original board is recorded, and when that images is cloned to a new board, it looks like a different ethernet adapter was added to the system. And so that goes to the top of the list and causes the boot delay you’re seeing.

So my thought here is to take your QSPI0.img and post-process it to “erase” the uefi_variables partition. When I tried this out myself, I found there are actually 3 partitions you need to erase:

  • uefi_variables
  • uefi_ftw
  • reserved_partition

In order to know the right places to erase, you should capture your original flash log and look for a table like this:

[ 5.7482 ] partition_id partition_name StartingLba EndingLba
[ 5.7484 ] 1 BCT 0 2047
[ 5.7486 ] 2 A_mb1 2048 3071
[ 5.7487 ] 3 A_psc_bl1 3072 3583
[ 5.7489 ] 4 A_MB1_BCT 3584 3839
[ 5.7491 ] 5 A_MEM_BCT 3840 4351
[ 5.7493 ] 6 A_tsec-fw 4352 6399
[ 5.7495 ] 7 A_nvdec 6400 8447

Then find those 3 partitions I mentioned:

[ 5.7811 ] 50 uefi_variables 128000 128511
[ 5.7811 ] 51 uefi_ftw 128512 129535
[ 5.7811 ] 54 reserved_partition 130432 130559

First, you can create an image that contains all F’s to use as a source file for overwriting portions of your QSPI0.img:

Create a 16MB binary image file of all F’s (larger than any partition we possibly want to fill)
dd if=/dev/zero bs=1M count=16 | tr ‘\000’ ‘\377’ > all_ff_binary.img

Then you can modify those 3 portions of QSPI0.img:

dd if=all_ff_binary.img of=QSPI0.img bs=512 seek=128000 count=512 conv=notrunc
dd if=all_ff_binary.img of=QSPI0.img bs=512 seek=128512 count=1024 conv=notrunc
dd if=all_ff_binary.img of=QSPI0.img bs=512 seek=130432 count=128 conv=notrunc

Above I used block size (bs) of 512 to coincide with how the LBAs are defined in the table. That reduces the amount of math you need to do! That way you can just set seek to match that exact offset, and then you just need to calculate the count, which is an easy one.

Finally, update nvpartitionmap.txt with the updated sha256sum. Now when you use the modified image, it will be like the very first boot from a UEFI perspective.

2 Likes

Might be a little late, for for completeness of the issue here, yes, I am using the backup and restore scripts included in the Jetpack folder tools/backup_restore. This creates a backup of everything including the QSPI as you found out.
I did find a work around with DaveYYY suggestion to remove the network stack from the bootloader. (It seems I did not remove the correct thing from the bootloader before so that’s why it wasn’t working before) After doing that well
 it can’t boot from the network anymore so the issue you found doesn’t matter. But this is not ideal.
I will give your solution a shot early next week and report back here. Thanks for sending those instructions!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.