Title: How to Reduce Mass Flashing Time on NVIDIA Jetson Devices Over USB 2.0?

When using NVIDIA’s massflash feature to flash multiple Jetson devices in parallel, I’m facing a significant performance issue. I have around 10 Jetson boards (Xavier NX/Nano-class devices), and each board connects through USB 2.0 recovery mode.

Even when using the --massflash option with multiple devices, the total flashing time is extremely slow:

  • 10 boards take around 120 minutes to complete a full flash

  • Flash speed per device becomes very inconsistent

  • Some devices take much longer when the host PC becomes overloaded

  • USB network interfaces appear/disappear, sometimes causing delays

My setup:

  • JetPack 5.x (L4T)

  • Flashing via l4t_initrd_flash.sh

  • Using USB 2.0 recovery mode (default)

  • Host PC: Ubuntu 20.04 with USB 2.0/3.0 ports

  • Massflash command example:

    sudo ./tools/kernel_flash/l4t_initrd_flash.sh \
        --massflash 10 --network usb0 --flash-only --reuse \
        jetson-xavier-nx-devkit-emmc mmcblk0p1
    
    

Main Problem

USB-based flashing seems to be the bottleneck. Even though USB 2.0 theoretically supports ~40 MB/s, actual throughput per device becomes extremely slow when flashing many boards in parallel.

Questions:

  1. Is it normal for massflash to take around 2 hours for 10 devices over USB 2.0?

  2. Does NVIDIA provide any method to speed up multi-device flashing?

  3. Is there a way to use Ethernet or network-based flashing instead of USB recovery mode?

  4. Would using multiple host PCs or multiple USB controllers significantly improve flashing speed?

  5. Does preparing a prebuilt image (system.img) or enabling --reuse help reduce per-device flashing time?


Additional Observations

  • Running two flash processes at the same time on the same PC makes the system lag heavily.

  • tar -xpf system.img during flashing is extremely slow and CPU-bound.

  • New USB interfaces appear for each device, sometimes making the network stack unstable.

  • USB 3.0 ports do not improve flashing speed because recovery mode is still USB 2.0.


What I want to know

I’m looking for practical, real-world methods to reduce flashing time in production:

  • Using a more powerful host machine?

  • Using PCIe USB controller cards?

  • Splitting devices across multiple PC

  • Using network booting + provisioning instead of USB?

  • Pre-flashing eMMC modules externally?

  • Using NVIDIA factory tools?

If anyone has experience mass-producing Jetson boards or has optimized the NVIDIA massflash workflow, I’d really appreciate detailed recommendations.

Hi tuan100220,

I’ve moved your topic to the correct category for Xavier NX.

It seems not the expected result to me.
Is there any error during flash? Or what causes it stucks?

The massflash method should be the one for this use case.

Sorry, the current flash method is based on USB.

Please simply switch another host to check if there’s the similar issue.

I think the --flash-only option will skip the step to create the image during flash.

Could you share the full log from your host during flash?

Hi KevinFFF,

Thank you for your suggestions.

For my testing, flashing 1 device takes about 12 minutes, so flashing 10 devices would roughly be 12 minutes × 10. It seems the total time cannot be reduced further using massflash over USB 2.0. I suspect the USB 2.0 bandwidth is not sufficient, as all devices are connected to the same USB bus.

Regarding stability, the flash often stalls at the tar -xpf system.img step, causing my host machine to hang. Interestingly, the host can hang even when flashing only 2 devices, so it’s not purely a scaling issue. Here is an example from my flash log:

Allocating group tables:   0/112 done
Writing inode tables:   0/112 done
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information:   0/112 done

Formatting APP partition done
Formatting APP partition /dev/mmcblk0p1 ...
tar -xpf /mnt/internal/system.img  --checkpoint=10000 --warning=no-timestamp --numeric-owner --xattrs --xattrs-include=*  -C  /tmp/ci-v2i0fRzb3O
tar: Read checkpoint 10000
tar: Read checkpoint 20000
...
Erased 33554432 bytes from address 0x00000000 in flash

Do you have any recommendations to speed up massflash over multiple devices, or to improve stability when using USB 2.0? Would using multiple USB controllers or Ethernet-based flashing significantly help?

Thanks for your advice.

Best regards,
Tuan

Why will the flashing cause your host hang?

Have you tried using USB3.0 port from your host if you suspect the issue is caused from the bandwidth of USB2.0?

For the massflash, they should be flashed at the same time so that it could reduce the flashing time.
Please just take 2 device for the test. If one device takes 12 minutes to flash, I would expect that flashing 2 devices with massflash will not exceed 15 minutes.

I am flashing using the following command:

sudo ionice -c 1 -n 0 ./tools/kernel_flash/l4t_initrd_flash.sh \
    --network usb0 --flash-only --massflash 5

However, the flashing time becomes almost double compared to flashing a single device.
It looks like my CPU might be the bottleneck, as it gets heavily loaded during the process.
For reference, my host machine uses an Intel Core i5-12400.

There is no update from you for a period, assuming this is not an issue anymore.
Hence, we are closing this topic. If need further support, please open a new one.
Thanks
~1217

Could you check the flash log if they are flashing at the same time?

Have you tried using another more powerful host and hit the similar issue?