Initrd massflash issues (Jetpack 5.1.2)

Hello There,
I am currently trying to scale up the flashing process to collect more data about the compute modules. During the mass flash process I stumbled across some flashing issues using init_rd massflash in Jetpack 5.1.2

Problem Isolation:

I am currently connecting 8-10 Jetson Xavier NX in recovery mode.

To tear down the core issue I made sure that…

  • the generated massflash Image is generated for at least 10 Jetson devices
  • my powersupplies are able to handle 10 booting Xavier NX simultaneously (I have tested that with 16 Booting Xavier NX)
  • each Jetson has its own USB host controller for the flashing process
usb3              1d6b:0002 09 1IF  [USB 2.00,   480 Mbps,   0mA] (xhci-hcd 0000:03:00.0) hub
  3-1               0955:7e19 00 1IF  [USB 2.00,   480 Mbps,  32mA] (NVIDIA Corp. APX)
usb4              1d6b:0003 09 1IF  [USB 3.00,  5000 Mbps,   0mA] (xhci-hcd 0000:03:00.0) hub
usb5              1d6b:0002 09 1IF  [USB 2.00,   480 Mbps,   0mA] (xhci-hcd 0000:04:00.0) hub
  5-1               0955:7e19 00 1IF  [USB 2.00,   480 Mbps,  32mA] (NVIDIA Corp. APX)
usb6              1d6b:0003 09 1IF  [USB 3.00,  5000 Mbps,   0mA] (xhci-hcd 0000:04:00.0) hub
usb7              1d6b:0002 09 1IF  [USB 2.00,   480 Mbps,   0mA] (xhci-hcd 0000:05:00.0) hub
  7-1               0955:7e19 00 1IF  [USB 2.00,   480 Mbps,  32mA] (NVIDIA Corp. APX)
usb8              1d6b:0003 09 1IF  [USB 3.00,  5000 Mbps,   0mA] (xhci-hcd 0000:05:00.0) hub
usb9              1d6b:0002 09 1IF  [USB 2.00,   480 Mbps,   0mA] (xhci-hcd 0000:06:00.0) hub
  9-1               0955:7e19 00 1IF  [USB 2.00,   480 Mbps,  32mA] (NVIDIA Corp. APX)
usb10             1d6b:0003 09 1IF  [USB 3.00,  5000 Mbps,   0mA] (xhci-hcd 0000:06:00.0) hub
usb11             1d6b:0002 09 1IF  [USB 2.00,   480 Mbps,   0mA] (xhci-hcd 0000:09:00.0) hub
  11-1              0955:7e19 00 1IF  [USB 2.00,   480 Mbps,  32mA] (NVIDIA Corp. APX)
usb12             1d6b:0003 09 1IF  [USB 3.00,  5000 Mbps,   0mA] (xhci-hcd 0000:09:00.0) hub
usb13             1d6b:0002 09 1IF  [USB 2.00,   480 Mbps,   0mA] (xhci-hcd 0000:0a:00.0) hub
  13-1              0955:7e19 00 1IF  [USB 2.00,   480 Mbps,  32mA] (NVIDIA Corp. APX)
usb14             1d6b:0003 09 1IF  [USB 3.00,  5000 Mbps,   0mA] (xhci-hcd 0000:0a:00.0) hub
usb15             1d6b:0002 09 1IF  [USB 2.00,   480 Mbps,   0mA] (xhci-hcd 0000:0b:00.0) hub
  15-1              0955:7e19 00 1IF  [USB 2.00,   480 Mbps,  32mA] (NVIDIA Corp. APX)
usb16             1d6b:0003 09 1IF  [USB 3.00,  5000 Mbps,   0mA] (xhci-hcd 0000:0b:00.0) hub
usb17             1d6b:0002 09 1IF  [USB 2.00,   480 Mbps,   0mA] (xhci-hcd 0000:0c:00.0) hub
  17-1              0955:7e19 00 1IF  [USB 2.00,   480 Mbps,  32mA] (NVIDIA Corp. APX)
usb18             1d6b:0003 09 1IF  [USB 3.00,  5000 Mbps,   0mA] (xhci-hcd 0000:0c:00.0) hub
  • my Host system is able to handle that much USB Devices (I only noticed a CPU load on each core of 100% for about 1-2 seconds at the start and end of the massflash, due to USB mount and unmount operations)
  • I start the process with the highest I/O priority as stated in the “README_initrd_flash.txt”
    → sudo ionice -c 1 -n 0 ./tools/kernel_flash/l4t_initrd_flash.sh --flash-only --network usb0 --massflash 10
  • my issue appears also on a different host system (48 CPU Core Workstation & about 64GB of RAM), tested on Ubuntu 18.04 & 20.04

In order to get more information than “Flashing Failed” I modified the initrd_massflash script to collect more data. The issues below are also reproducible with the unmodified init_rd massflash script.

The errors seem to have something to do with unavailable storage devices. My Current assumption is that one of the flashing scripts is unmounting a Jetson, that is handled by another flashing process.

Failed flashing logs:

The following logs were collected on a Ubuntu 18.04 System (4 Core CPU & 8GB of RAM)


336+0 records in
336+0 records out
336 bytes copied, 0,0125908 s, 26,7 kB/s
Writing bpmp-fw-dtb_b partition done
writing item=55, 1:3:kernel_b, 15142551552, 67108864, boot.img, 43569152, fixed-<reserved>-12, a84d1802f9dc10564ecff0d3fcb06082c815142e
Writing kernel_b partition with boot.img
Get size of partition through connection.
blockdev: cannot open /dev/sdc12: No such file or directory
[ 1825]: l4t_flash_from_kernel: Get size of partition failed
[ 1825]: l4t_flash_from_kernel: Error flashing emmc
Error flashing non-qspi storage

Cleaning up...

452+0 records in
452+0 records out
452 bytes copied, 0,00270544 s, 167 kB/s
Writing recovery-dtb partition done
writing item=60, 1:3:RECROOTFS, 15328608256, 104857600, , , fixed-<reserved>-17, 
[ 703]: l4t_flash_from_kernel: Warning: skip writing RECROOTFS partition as no image is specified
writing item=61, 1:3:esp, 15433465856, 67108864, esp.img, 67108864, fixed-<reserved>-18, 81add5846db4c52f28a11ba16df00871a06b70c2
Writing esp partition with esp.img
Get size of partition through connection.
blockdev: cannot open /dev/sdx18: No such file or directory
[ 703]: l4t_flash_from_kernel: Get size of partition failed
[ 703]: l4t_flash_from_kernel: Error flashing emmc
Error flashing non-qspi storage

Cleaning up...


336+0 records in
336+0 records out
336 bytes copied, 0,00240753 s, 140 kB/s
Writing bpmp-fw-dtb_b partition done
writing item=55, 1:3:kernel_b, 15142551552, 67108864, boot.img, 43569152, fixed-<reserved>-12, a84d1802f9dc10564ecff0d3fcb06082c815142e
Writing kernel_b partition with boot.img
Get size of partition through connection.
blockdev: cannot open /dev/sdb12: No such file or directory
[ 855]: l4t_flash_from_kernel: Get size of partition failed
[ 855]: l4t_flash_from_kernel: Error flashing emmc
Error flashing non-qspi storage

Cleaning up...


Formatting APP parition done
Formatting APP partition /dev/sdf1 ...
tar --xattrs -xpf /opt/tobias/framework/assets/initrd_massflash/current_image/tools/kernel_flash/images/internal/system.img  --checkpoint=10000 --warning=no-timestamp --numeric-owner  -C  /tmp/ci-bmRAhaKgLy
tar: Read checkpoint 10000
tar: Read checkpoint 20000
tar: Read checkpoint 30000
tar: Read checkpoint 40000
tar: ./usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc: Cannot write: Read-only file system
tar: ./usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc: Cannot utime: Read-only file system
tar: ./usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc: Cannot change ownership to uid 0, gid 0: Read-only 

[a lot of errors of the same kind...]

tar: ./usr/share/perl/5.30.0/CPAN/Meta/YAML.pm: Cannot open: Input/output error
tar: ./usr/share/perl/5.30.0/CPAN/Meta/History: Cannot mkdir: Input/output error
tar: ./usr/share/perl/5.30.0/CPAN/Meta/History/Meta_1_3.pod: Cannot open: Input/output error
tar: ./usr/share/perl/5.30.0/CPAN/Meta/History/Meta_1_2.pod: Cannot open: Input/output error
tar: ./usr/share/perl/5.30.0/CPAN/Meta/History/Meta_1_0.pod: Cannot open: Input/output error
Cleaning up...


The fail rate is about 15-20% of the devices I connect. It also seems rather random on which USB port it occurs.

My actual question

  • Has anyone experienced similar issues with init_rd massflash or is this even a known issue with the massflash script? If yes, how did you guys solve this?
  • How many devices have been verified to work reliable (I assumed the limit was 10, because the default is limited to 10)?

Always put the full log and don’t crop them.

I don’t think we have tested with that much devices.
How many devices can you reliably flash without failing?

Hello,

I now got 8 devices “almost” reliable to flash simultaneously. I still had a few issues during the setup of the massflash image. One of my problems was my storage space. I didn’t got any errors during the generation/packing of the initrd massflash image and I didn’t noticed it at first. This caused the files that were created for the flashing instances to break. Therefore the flashing errors were depending on the instance ID.

I have no idea what you are talking about here.
Also, this topic is marked as solved because you haven’t made a reply for a long time, so it’d be better for you to file a new one.

Hey Dave,

My issue is resolved.

TLDR:
The bootloader contains boot images for each USB instance that is created. Those files are generated during the creation of the initrd mfi. Some of those files were broken due to missing storage space. This caused the mfi process to stop.

/bootloader $ ls ./boot*
./boot0.img ./boot1.img ./boot2.img ./boot3.img ./boot4.img ./boot.img

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.