Xavier NX with A/B Redundancy

Hello,

I am trying to create an A/B Redundancy System on a Xavier NX Devkit with attached NVME SSD.

My Environment is Ubuntu 20.04, L4T r34.1.1.

I have had issues flashing the System with these commands from the README_initrd_flash.txt:

ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh --no-flash jetson-xavier-nx-devkit-qspi internal
ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh --no-flash \
       --external-device nvme0n1 \
       -S 8GiB \
       -c ./tools/kernel_flash/flash_l4t_nvme_rootfs_ab_EL.xml \
       --append jetson-xavier-nx-devkit external
ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh --flash-only

The flash performed for the NVME but failed for the QSPI. It said it coult not initialize mtdev.

So I changed to flash the NVME part with the l4t_initrd_flash.sh and the QSPI with the flash.sh:

ROOTFS_AB=1 ./tools/kernel_flash/l4t_initrd_flash.sh \
      --external-device nvme0n1 \
      -S 8GiB \
      -c ./tools/kernel_flash/flash_l4t_nvme_rootfs_ab_EL.xml \
      --external-only jetson-xavier-nx-devkit \
      external
 ROOTFS_AB=1 ./flash.sh -c ./bootloader/t186ref/cfg/flash_l4t_t194_spi_sd_p3668_rootfs_ab_EL.xml jetson-xavier-nx-devkit-qspi external

I modified the files to include the correct sectors and added the SMD_b partition.

The system boots successfully but I get an error when trying to check the A/B redundancy with nv_update_engine

Nvidia A/B-Redundancy Update tool Version 2.0
Fail to open metadata file
Init SMD partition failed!
Fail to open metadata file
Init SMD partition failed!
verifying update
Verify bootloader update begins.
Unable to find kernel cmdline paramater boot.slot_suffix=
Error: Verify bootloader update failed!
Verify rootfs update begins.
Fail to open metadata file
RootFS A/B is not enabled, verification finishes.

Even though a smd_info.rootfs_AB.cfg and slot_metadata.bin.rootfsAB already existed I also tried with a newly created. The file had the same size after and also did not work.

What could I be missing and what could cause that the initrd flash fails to flash the qspi?

I am using the ubuntu-base 20.04.4 image with packages from nvubuntu-focal-minimal-aarch64-packages applied, afterwards binaries were applied.

Best Regards

Can I supply any more info for someone to be able to help?

Could you test what you want to do with jetpack4.x first? We are not sure if jp5.0 has any potential bug or this issue is due to specific nvme.

Hey Wayne,

thanks for the answer. I already wanted to test that out. I have tried running Jetpack on Ubuntu 20.04, but the initrd flash script just shuts down with “Cleaning Up” without any error. I suppose it’s totally not compatible and I’ll have to set up a 18.04 System for it?

The issue does not seem to be related to the NVME on 5.1 though. It flashed the NVME correctly but said it can’t initialize the mtd utils to erase the flash. As I have an issue with the SMD partition in the nv_update_engine, could it be that my custumized system is missing a program to access the QSPI flash? As I saw the image creation copys binaries from my rootfs to create the initrd it is using in initrd flash. The I quite got the impression that even though my installed system says mtd-utils are installed, some parts of it might be missing.

Something that you can test before downgrade. Could you remove rootfs_AB and flash it and see if it would work?

I will try both when I am in the office again, will report. Sadly only the flash.sh works in WSL, the initrd only works on an actual PC for me :(

Hey Wayne,

I have tried flashing without the AB option. The result is the same. The initrd is not able to flash the qspi:

blockdev: cannot open /dev/mmcblk0boot0: No such file or directory
Flash index file is /qspi/internal/flash.idx
Number of lines is 65
max_index=64
[ 0]: l4t_flash_from_kernel: Starting to flash to qspi
flash_erase: error!: can't initialize libmtd
[ 0]: l4t_flash_from_kernel: Error flashing qspi

I used the following command:

PARTITION_FILE=nvme_rootfs.xml
./tools/kernel_flash/l4t_initrd_flash.sh --no-flash jetson-xavier-nx-devkit-qspi internal
./tools/kernel_flash/l4t_initrd_flash.sh --no-flash \
       --external-device nvme0n1 \
       -S 8GiB \
       --showlogs \
       -c $BASEDIR/$PARTITION_FILE \
       --append jetson-xavier-nx-devkit external
./tools/kernel_flash/l4t_initrd_flash.sh --showlogs --flash-only

The used partition table file is this:

<?xml version="1.0"?>

<!-- Nvidia Tegra Partition Layout Version 1.0.0 -->

<partition_layout version="01.00.0000">
    <device type="nvme" instance="0" sector_size="512" num_sectors="488397168">
        <partition name="master_boot_record" type="protective_master_boot_record">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 512 </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <description> **Required.** Contains protective MBR. </description>
        </partition>
        <partition name="primary_gpt" type="primary_gpt">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 19968 </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <description> **Required.** Contains primary GPT of the `sdmmc_user` device. All
              partitions defined after this entry are configured in the kernel, and are
              accessible by standard partition tools such as gdisk and parted. </description>
        </partition>
        <partition name="APP" type="data">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> APPSIZE </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 0x8 </allocation_attribute>
            <align_boundary> 4096 </align_boundary>
            <percent_reserved> 0 </percent_reserved>
            <filename> APPFILE </filename>
            <unique_guid> APPUUID </unique_guid>
            <description> **Required.** Contains the rootfs. This partition must be defined
              after `primary_GPT` so that it can be accessed as the fixed known special device
              `/dev/mmcblk0p1`. </description>
        </partition>
        <partition name="kernel" type="kernel" oem_sign="true">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 67108864 </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <filename> LNXFILE </filename>
            <description> **Required.** Slot A; contains boot.img (kernel, initrd, etc)
              which is loaded in when cpu-bootloader failes to launch the kernel
              from the rootfs at `/boot`. </description>
        </partition>
        <partition name="kernel-dtb" type="kernel_dtb" oem_sign="true">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 458752 </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <filename> DTB_FILE </filename>
            <description> **Required.** Slot A; contains kernel device tree blob. </description>
        </partition>
        <partition name="RECNAME" type="data" oem_sign="true">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> RECSIZE </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <filename> RECFILE </filename>
            <description> **Required.** Contains recovery image. </description>
        </partition>
        <partition name="RECDTB-NAME" type="data" oem_sign="true">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 524288 </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <filename> RECDTB-FILE </filename>
            <description> **Required.** Contains recovery DTB image. </description>
        </partition>
        <partition name="BOOTCTRLNAME" type="data">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 262144 </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <filename> BOOTCTRL-FILE </filename>
            <description> **Required.** Slot A; contains boot control data. </description>
        </partition>
        <partition name="RECROOTFS" type="data">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> RECROOTFSSIZE </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 0x8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <description> **Optional.** Reserved for future use by the recovery filesystem;
              removable. </description>
        </partition>
        <partition name="esp" type="data">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 67108864 </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 0x8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <filename> ESP_FILE </filename>
            <partition_type_guid> C12A7328-F81F-11D2-BA4B-00A0C93EC93B </partition_type_guid>
            <description> **Required.** EFI system partition with L4T Launcher. </description>
        </partition>
        <partition name="UDA" type="data">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 18432 </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 0x808 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <description> **Required.** Automatically takes all remaining space on the device except that
              occupied by the `secondary_gpt` partition. Allocation attribute must be set to 0x808.
              May be mounted and used to store user data. </description>
        </partition>
        <partition name="secondary_gpt" type="secondary_gpt">
            <allocation_policy> sequential </allocation_policy>
            <filesystem_type> basic </filesystem_type>
            <size> 0xFFFFFFFFFFFFFFFF </size>
            <file_system_attribute> 0 </file_system_attribute>
            <allocation_attribute> 8 </allocation_attribute>
            <percent_reserved> 0 </percent_reserved>
            <description> **Required.** Contains secondary GPT of the `sdmmc_user`
              device. </description>
        </partition>
    </device>
</partition_layout>

I am quite sure that my custom system is missing something that is related to the mtd-utils. But what file could I have missed?

Am trying the same with the samplefs now to see if that makes any difference.

As I expected, when I use the samplefs from here: Jetson Linux 34.1 | NVIDIA Developer
The flash process works.

Which packet could I have missed? libmtd-dev is installed.
I saw that the minimal file in the samplefs folder sets a specific version of the packages like “mtd-utils=1:2.1.1-1ubuntu1”
I have changed my script to remove all those so it says “mtd-utils” because the installation caused errors of unavailable packages. Maybe I am now pulling in an incompatible packet?

@WayneWWW
I have found the issue.

The reason was really that I removed the exact version.
Instead of https://ubuntu.pkgs.org/20.04/ubuntu-main-amd64/kmod_27-1ubuntu2_amd64.deb.html
following is installed kmod_27-1ubuntu2.1_amd64.deb Ubuntu 20.04 LTS Download

This causes an issue because ZSTD support was enabled in this release. The supplied recovery_copy_binlist.txt does not include the libzst but copies the kmod binary, which requires it. This results in broken kernel module loading in the initrd.