Orin NX - backup restore not working

uefi_Jetson_RELEASE.bin (3.2 MB)

Please format it as ext4 before use to check if it could help.

I’ve also verified this use case(restore on an empty NVMe SSD) working on my setup.
Please refer to the following logs:
host_restore.log (115.4 KB)
host_backup.log (117.4 KB)
serial_restore.log (154.7 KB)
and try to find the difference with yours.

Hi KevinFFF

Are you sure that the provided uefi binary is the debug binary and not the release one?
Here the serial log from the NVMe not booting and one that does boot:
log_booting.txt (47.6 KB)
log_not_booting.txt (42.5 KB)

There is no message from the L4TLauncher in the case where the system does not boot.

Formatting the NVMe prior to ext4 does not help. When it is in the factory state, gdisk yields:

gdisk /dev/nvme0n1
Partition table scan:
MBR: MBR only
BSD:  not present
APM: not present
GPT:  not present

We checked also the log files and do not see any obvious difference.
Thank you.

Sorry for uploaded the wrong one, please use the following one instead.
uefi_Jetson_DEBUG.bin (3.2 MB)

Hi KevinFFF

Please find the log files for the not working and the working case:
log_debug_uefi_nok.txt (203.4 KB)
log_debug_uefi_ok.txt (143.6 KB)

The only difference that catches our eye is that the in the boot order, the 240 GB PCIe Drive has ID 0x0009 and not 0x0001 as in the working case.
Can you check?

With the debug bios, it resets the system and does not go into the EFI Shell.
Thank you.

I think it should be fine since it is just the enumeration for the index.

I’ve checked the logs you shared but there’s no apparent errors in your nok case and it just stucks.

May I know how do you get log_debug_uefi_ok.txt?
Is it the log you captured before performing restore?

Have you also tried these 2 methods if they may cause different result?

Hi KevinFFF

The log_debug_uefi_ok.txt is from the same type of SSD, from which we took the backup image. As said, if we once flash the SSD with the kernel_flash tool, then also the restoring of the same SSD works.

Formatting the NVMe SSD as ext4 did not help. We compared also the partition table of the working and not working case and do not see any difference.

We tested SSDs from other manufacturer and there we do not see an issue with restoring images. However we need a solution for the current NVMe.

Thank you.

Hi KevinFFF

The issue seems to be in relation with the file BOOTAA64.efi or the EFI partition. We replaced this file on the EFI partition of the not booting NVMe with the one inside the Linux_for_Tegra/bootloader/ folder and after, the system bootet.
Any idea where this could go wrong?

Hi KevinFFF

We did further tests and here are the results:

  1. Flashing with kernel_flash tool → BOOTAA64.efi identical to the one in the bootloader folder
  2. backing up and restoring the same working SSD → BOOTAA64.efi identical to the one in the bootloader folder
  3. restoring empty SSD → BOOTAA64.efi differs from the one in the bootloader folder

Now the question is, if the modification happens during the creation of the backup image or during the restoring part. And also the question is, why this modification does not happen, when the SSD was once flashed with the kernel_flash tool.

On older JetPack, we could check the backup partitions with 7zip. This seems not possible for JP6.1 and so we cannot check if the nvme0n1p10_bak.img already has a changed BOOTAA64.efi file inside.

Let us know if you need further information.
Thank you for your help.

Do you mean that the issue is specific to current NVMe SSD?
If so, could you share the detail of your NVMe?

How did you check if they are identical or different?
How could it happen why BOOTAA64.efi in the NVMe is different from the one in the backup package?

It should be the similar flow for backup/restore in JP5 and JP6.

Hi KevinFFF

The SSDs are from Apacer: PV920-M280
At the moment, we only see the issue with this type, but we only had 2 other types at hand.

We just compare the files with “diff” to see if they are identical or not.

We are not sure if the BOOTAA64.efi is already different in the backup package or if it happens during the restore process, as we have no possibility to check the backup partition. However as restoring other SSDs work, the backup image should be ok and the error seems to happen during restore.

Overall we have no clue how this can happen, especially as it occurs only if we do not use the kernel_flash tool before restoring the device.
Do you have any advice how we can further investigate this?

Thank you.

Hi KevinFFF

We finally found the solution. In the script “nvrestore_partition.sh”, we had to remove the argument “conv=sparse”:

zstd -dc “${FIELDS[1]}” | dd of=“/dev/${FIELDS[2]}” status=progress conv=sparse bs=512 seek=$((FIELDS[3])) count=$((FIELDS[4]))

change to

zstd -dc “${FIELDS[1]}” | dd of=“/dev/${FIELDS[2]}” status=progress bs=512 seek=$((FIELDS[3])) count=$((FIELDS[4]))

To our understanding, this argument is for image creation. Can you check why it is used for restoring the partition? Any idea what goes wrong with using this argument?
Thank you.

Kind regards

1 Like

Do you mean that the issue is specific to one type SSD(Apacer: PV920-M280)?

It seems you’ve found the workaround for your use case.
We use zstd to decompress backup image(nvme0n1p*_bak.img) and pass to dd and write into /dev/nvme0n1.
conv=sparse is used to optimize the restoring process and the size.

Yes, at the moment the issue is specific to one type of SSD. However we only had two other types at hand for testing, so also other SSDs could be affected.

In JetPack 5.1.2, the option “conv=sparse” was not used in the restore process.

Hi, we found a bug in the implementation when adding conv=sparse argument. Using conv=sparse, the tool will skip writing the zeroes content from the restore image to the storage device. In the case that the storage device is not completely clean before restoring, it will retain contents from before the restoring process.

So it is advised now to remove the conv=sparse as a WAR. In the future, we will make update to the tool to fix this.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.