SMD Partition Corrupted after Flash

For some time now my primary and secondary SMD partition have been corrupted despite numerous full flashes. I am not sure what was the root cause of the corruption but I assumed that full flash should resolve the issue.

root@ws-nxcore:/etc/waggle# nvbootctrl dump-slots-info
primary SMD is corrupted!
secondary SMD is corrupted!
# nvbootctrl get-number-slots
primary SMD is corrupted!
secondary SMD is corrupted!
-5
root@ws-nxcore:~# nvbootctrl get-suffix 0

root@ws-nxcore:~# nvbootctrl get-suffix 1
_b

I am not 100% sure when this started occurring (as I don’t check the SMD partition often) but I believe it was when I started using the ./nvmassflashgen.sh script to start generating MFI builds. Do the BOARDID, etc. parameters supplied to the ./nvmassflashgen.sh potentially have something to do with this? I always wondered what the proper values should be for a non-devkit Xavier NX. The README_Massflash.txt file specifies that for jetson-xavier-nx-devkit-emmc they should be:

BOARDID=3668
BOARDSKU=0001
FAB=100
BOARDREV=N/A

Since, I am using a ConnectTech Photon carrier board I chose different values:

BOARDID="NGX003" BOARDSKU="0000" FAB="000" \
BOARDREV="E.0" FUSELEVEL="fuselevel_production" \
./nvmassflashgen.sh waggle_photon mmcblk0p1

However, attempting to recover the system by executing a full flash (sudo ./flash.sh cti/xavier-nx/photon mmcblk0p1) did not resolve my issue. I got similar errors:

jswantek@jswantek-desktop:~$ nvbootctrl dump-slots-info
Fail to open metadata file
Init SMD partition failed!
Fail to open metadata file
jswantek@jswantek-desktop:~$ nvbootctrl get-number-slots
Fail to open metadata file
Init SMD partition failed!
Fail to open metadata file
-5
jswantek@jswantek-desktop:~$ nvbootctrl get-suffix 0
Fail to open metadata file
Init SMD partition failed!

jswantek@jswantek-desktop:~$ nvbootctrl get-suffix 1
Fail to open metadata file
Init SMD partition failed!
_b

Attached you will find the bootloader/smd_info.cfg and bootloader/slot_metadata.bin file (which are from the stock L4T 32.4.4 NVidia BSP).
smd_info.cfg (1.6 KB)
slot_metadata.bin (22 Bytes)

As well as my flash.sh log output (of the full ./flash.sh flash mentioned above) and the serial terminal output during flashing and first boot.
ctibase_flashsh.txt (44.7 KB)
serial_ctibase_flashsh.txt (37.0 KB)

I am looking for any information to help me resolve this issue and any ideas on why this issue occurred in the first place.

Thanks,
Joe

p.s. I am using a ConnectTech Photon carrier board (https://connecttech.com/product/photon-jetson-nano-ai-camera-platform/) which is why my above flash command target is not the devkit.

hello joseph.swantek,

may I know what’s the modification you’d done,
did you enable Bootloader update and redundancy feature?
please refer to developer guide for reference, Bootloader Update and Redundancy.

also, there’s Jetson Module EEPROM Layout for saving board information, you may check it as following,
for example, $ i2cdump -f -y 0 0x50
thanks

Hi joseph.swantek -

Have you been in contact with our Tech Team? If you fill out our support form (found here) our team will help you troubleshoot and get you up and running.

Let me know if you have any questions!
Jacki

Thanks for the reply Jacki. I have been in contact with your tech support team on numerous occasions for various topics including pad 9 customization.

If I have further questions I will be sure to reach out.

Thanks,
Joe

Hey Jerry,

Thanks for the quick reply

In my original post I posted the smd_info.cfg with my SMD configurations. This, combined with the slot_metadata.bin binary that is flashed to the SMD and SMD_b partitions are configured with A/B support disabled and redundancy disabled. The stock configuration that is included in the L4T download from the Nvidia site (L4T R32.4.4 archive | NVIDIA Developer).

In the past I have successfully used a modified smd_info.cfg (1.6 KB) (this file has < REDUNDANCY_USER 1 >) and slot_metadata.bin to enable A/B support and redundancy:

  1. Create slot_metadata.bin from the following command: su ./nv_smd_generator smd_info.cfg slot_metadata.bin

  2. put the slot_metadata.bin into the directory Linux_for_Tegra/bootloader/on my x86 flashing machine

  3. flash my device using the ./flash.sh command

  4. then I saw that redundancy was enabled

    root@nx-sample-token:~# nvbootctrl dump-slots-info
    magic:0x43424e00, version: 3 features: 3 num_slots: 2
    slot: 0, priority: 15, suffix: _a, retry_count: 7, boot_successful: 1
    slot: 1, priority: 14, suffix: _b, retry_count: 7, boot_successful: 1

  5. I could then execute commands like nvbootctrl set-active-boot-slot 1, nvbootctrl dump-slots-info, nv_update_engine -d(to disable redundancy), nvbootctrl set-active-boot-slot 0, nv_update_engine -v, nv_update_engine -e. And everything worked well (via nvbootctrl dump-slots-info commands).

At some point between the above steps and the below steps my SMD became corrupt and I have been unable to resolve it.

The steps I am now performing to enable A/B support and bootloader redundancy are the following:

  1. Create smd_info.cfg that enables A/B support with bootloader redundancy smd_info.cfg (1.6 KB)
  2. Copy the smd_info.cfg to my Linux_for_Tegra/bootloader/smd_info.cfg on my flashing X86 machine
  3. Initiate my massflash image creation (per my original post). (I later realized that this did not create the new Linux_for_Tegra/bootloader/slot_metadata.bin with my settings, but was instead using the stock slot_metadata.bin provided by the L4T BSP download archive)
  4. Flash my device with the resulting mass flash (sudo ./nvmflash.sh)

But for some reason my SMD remains corrupt as stated in my original post. And I don’t know why? Or how to recover it.

As for the i2cdump thanks for that information. I was not aware there was a small eeprom device. I performed the dump and it output the following which I will look at in more depth later. One question. The EEPROM documentation you linked says the eeprom is on bus 2, but your command uses bus 0 (which works, see below). So, I assume its bus 0 for NX and not bus 2? Also I assume (for a different reason) this is accessible in the CBoot bootloader?

jswantek@jswantek-desktop:~$ i2cdump -f -y 0 0x50
No size specified (using byte-data access)
 0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 01 00 fc 00 54 0e 01 00 02 47 00 00 00 00 00 00    ?.?.T??.?G......
10: 00 00 00 00 36 39 39 2d 31 33 36 36 38 2d 30 30    ....699-13668-00
20: 30 31 2d 32 30 30 20 47 2e 30 00 00 00 00 00 00    01-200 G.0......
30: 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff    ................
40: ff ff ff ff a4 a0 05 2d b0 48 31 34 32 31 33 32    ....???-?H142132
50: 30 30 30 34 36 34 39 00 00 00 00 00 00 00 00 00    0004649.........
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
90: 00 00 00 00 00 00 4e 56 43 42 1c 00 4d 31 00 00    ......NVCB?.M1..
a0: ff ff ff ff ff ff ff ff ff ff ff ff a4 a0 05 2d    ............???-
b0: b0 48 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ?H..............
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ee    ...............?

hello joseph.swantek,

would like to narrow down the issue,
are you able to follow the developer guide, Bootloader Update and Redundancy.
please perform bootloader update payload generator with build_l4t_bup.sh script file?

you may check the device spec info from target device’s /etc/nv_boot_control.conf,
here’s commands for your reference,
i.e. $ sudo FAB=400 BOARDID=2888 FUSELEVEL=fuselevel_production ./build_l4t_bup.sh jetson-xavier mmcblk0p1

Hi Jerrry,

I would be more then happy to do whatever steps to narrow down the issue. Here are the results from my testing:

root@ws-nxcore:~# cat /etc/nv_boot_control.conf
TNSPEC 3668-200-0001-G.0-1-0-waggle_photon-mmcblk0p1
TEGRA_CHIPID 0x19
TEGRA_OTA_BOOT_DEVICE /dev/mmcblk0boot0
TEGRA_OTA_GPT_DEVICE /dev/mmcblk0boot1

Executing the build_l4t_bup.sh: build_l4t_bup.txt (37.3 KB)

$ find bootloader/ | grep bl_update
$
$ sudo FAB=200 BOARDID=3668 FUSELEVEL=fuselevel_production ./build_l4t_bup.sh waggle_photon mmcblk0p1
...
Payloads saved to "/home/jswantek/workspace/custom_builds/eeprom07/full/bootloader/payloads_t19x/"

Upon attempting to flash these:

root@ws-nxcore:/opt/ota_package# ls -la bl_update_payload
-rw-r--r-- 1 root root 47658847 Mar  8 22:08 bl_update_payload
root@ws-nxcore:~# nv_update_engine --install
Nvidia A/B-Redundancy Update tool Version 1.2
primary SMD is corrupted!
secondary SMD is corrupted!
A/B has been disabled. Need to enable A/B.

Sidenote:
You will notice that I am building with a custom target waggle_photon. Here are the contents of my waggle_photon.conf file:

source "${LDK_DIR}/p3668.cti-base.common";
DTB_FILE=tegra194-xavier-nx-cti-NGX003-WAGGLE-WS.dtb;
EMMC_CFG=flash_waggle_l4t_t194_spi_emmc_p3668.xml;
EMMCSIZE=17179869184;
ROOTFSSIZE=10GiB;

The only changes tot the tegra194-xavier-nx-cti-NGX003-WAGGLE-WS.dtb file compared to flash_waggle_l4t_t194_spi_emmc_p3668.xml are the following

jswantek@ubuntu-laptop:~/workspace/custom_builds/eeprom07/full/bootloader/t186ref/cfg$ diff flash_waggle_l4t_t194_spi_emmc_p3668.xml flash_l4t_t194_spi_emmc_p3668.xml
628,638d627
<         <partition name="WAGGLE-RPI" type="data">
<             <allocation_policy> sequential </allocation_policy>
<             <filesystem_type> basic </filesystem_type>
<             <size> 4294967296 </size>
<             <file_system_attribute> 0 </file_system_attribute>
<             <allocation_attribute> 0x8 </allocation_attribute>
<             <align_boundary> 4096 </align_boundary>
<             <percent_reserved> 0 </percent_reserved>
<             <filename> waggle-rpi.img </filename>
<             <description> **Required.** Contains the Waggle Raspberry PI PXE boot filesystem </description>
<         </partition>

Now looking at the deltas in the p3668.cti-base.common vs the p3668.conf.common file are a bit more interesting

jswantek@ubuntu-laptop:~/workspace/custom_builds/eeprom07/full$ diff p3668.conf.common p3668.cti-base.common
1c1
< # Copyright (c) 2019-2020, NVIDIA CORPORATION. All rights reserved.
---
> # Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
42,55d41
< # Process_board_version:
< # Trigger to read the board id and board version from EEPROM on main board.
< # undef for non eeprom boards.
< process_board_version()
< {
< 	local board_id="${1}";
< 	local board_version="${2}";
< 	local board_sku="${3}";
< 	local board_revision="${4}";
< 	local chiprev="${5}";
<
< 	print_board_version "${board_id}" "${board_version}" "${board_sku}" "${board_revision}" "${chiprev}"
< }
<
101c87
< DTB_FILE=tegra194-p3668-all-p3509-0000.dtb;
---
> #DTB_FILE=tegra194-p3668-all-p3509-0000.dtb;
141c127
< PINMUX_CONFIG="tegra19x-mb1-pinmux-p3668-a01.cfg";
---
> PINMUX_CONFIG="tegra19x-xavier-nx-cti-mb1-pinmux-p3668-a01.cfg";
143c129
< PMC_CONFIG="tegra19x-mb1-padvoltage-p3668-a01.cfg";
---
> PMC_CONFIG="tegra19x-mb1-padvoltage-p3668-0001-a00.cfg";
158,159d143
< OTA_BOOT_DEVICE="/dev/mtdblock0";
< OTA_GPT_DEVICE="/dev/mtdblock0";

Could the missing OTA_BOOT_DEVICE and OTA_GPT_DEVICE settings be relevant to my problem?

I also noticed that tegra19x-mb1-padvoltage-p3668-0001-a00.cfg doesn’t exist. I assume this is a mistake in the ConnectTech p3668.cti-base.common file and the value should instead be PMC_CONFIG="tegra19x-mb1-padvoltage-p3668-a01.cfg"; (fyi @jross , this is in your CTI-L4T-XAVIER-NX-32.4.4-V005 BSP drop. why doesn’t the p3668.cti-base.common file just source the p3668.conf.common and only make the needed changes?)


Anyway, it seems I am able to generate the bl_update_payload correctly but still unable to flash it.

Thanks!
Joe

p.s. I also wanted to mention that I am building the cboot from source with some custom changes (relating to using the SDcard as a “fall-back” boot option if the emmc fails) that I don’t think should impact this but I wanted to mention it.

Okay I think I have some good news.

I made a change in my my waggle_photon.conf file to ensure the PMC_CONFIG variable was set correctly and to ensure the OTA_BOOT_DEVICE and OTA_GPT_DEVICE were set.

source "${LDK_DIR}/p3668.conf.common";
DTB_FILE=tegra194-xavier-nx-cti-NGX003-WAGGLE-WS.dtb;
EMMC_CFG=flash_waggle_l4t_t194_spi_emmc_p3668.xml;
EMMCSIZE=17179869184;
ROOTFSSIZE=10GiB;
# WORK-AROUND: bring-in CTI specific changes from p3668.cti-base.common (compared to p3668.conf.common)
PINMUX_CONFIG="tegra19x-xavier-nx-cti-mb1-pinmux-p3668-a01.cfg";

and I my SMD partition looks fixed!

root@ws-nxcore:~# nvbootctrl dump-slots-info
magic:0x43424e00,             version: 3             features: 1             num_slots: 2
slot: 0,             priority: 15,             suffix: _a,             retry_count: 7,             boot_successful: 1
slot: 1,             priority: 14,             suffix: _b,             retry_count: 7,             boot_successful: 1

and I see the contents of the /etc/nv_boot_control.conf looks like it has different values for TEGRA_OTA_BOOT_DEVICE and TEGRA_OTA_GPT_DEVICE

Before:

root@ws-nxcore:~# cat /etc/nv_boot_control.conf
TNSPEC 3668-200-0001-G.0-1-0-waggle_photon-mmcblk0p1
TEGRA_CHIPID 0x19
TEGRA_OTA_BOOT_DEVICE /dev/mmcblk0boot0
TEGRA_OTA_GPT_DEVICE /dev/mmcblk0boot1 

After

root@ws-nxcore:~# cat /etc/nv_boot_control.conf
TNSPEC 3668-200-0001-G.0-1-0-waggle_photon-mmcblk0p1
TEGRA_CHIPID 0x19
TEGRA_OTA_BOOT_DEVICE /dev/mtdblock0
TEGRA_OTA_GPT_DEVICE /dev/mtdblock0

Do you know why this resolved my issue?

1 Like