I just wanted to share some information about the massflash time when using initrd massflash. After some research I found out that the limit is the NFS server that serves the rootfs & image files is the bottleneck.
During the initrd phase, a minimal system is flashed over USB (the generated boot*.img images). This image brings up an RNDIS network interface over USB 2.0, which is then used to mount an NFS storage that is capped at around 29 MB/s read speed (this limit appears to come from the NFS kernel drivers).I tried modifying some NFS settings, but without much success. A single Jetson RNDIS USB 2.0 connection is enough to saturate this limit (~250 Mbit/s ≈ ~31 MB/s).
Snippet of Network traffic on the USB per system (in this case Orin NX) one system flashing vs multiple systems flash using massflash script.
If one Jetson is enough to saturate the NFS, it means that the flashing time multiplies by the number of Jetsons, which turns out to be true if we do the math (an encrypted backup image was used for this calculation):
1 System massflash: 6-7Min
8 System massflash: 45-50 Min
50/8 = ~6 min
I have tried to modify the boot*.img to include an iSCSI client. This resulted 4 iSCSI flashed Jetson to flash in the same time as one Jetson using NFS.
Now lets take a look into some actual data from an ongoing flashing process.
Datarate snippet of iSCSI 4 systems vs NFS (I think it was 8) systems vs 1 system NFS:
I have seen that Jetpack 7 includes the “unified flashing” that uses ADB. I haven’t looked into it yet.
Now my question is: How many systems have you verified can be flashed simultaneously using this ADB-based flashing? Are there any plans to backport this to Jetpack 6?
One minor foot note: Those flashing times assume that you have the most ideal USB setup.
USB root → Jetson 1
USB root → Jetson 2
USB root → Jetson 3
…and so on
Hi,
Does the type-A ports on the host PC support USB3? It seems like the bottleneck is the devices be flashed through USB2 so total bandwidth is limited to 480Mbps.
Hi,
Yes, the USB ports on my host machine are USB3.2 capable. However, the carrierboard that I am using for flashing is only USB2 capable on the flashing port.
480Mbps is the theoretically limit without USB protocol overhead + network stack over RNDIS, etc. The 250 Mbps actual data transfer per USB Port is totally realistic when considering this. USB can not be the bottleneck because I am not even able to reach the 250 Mbps when using NFS during initrd flash.
I also don’t expect much improvement by switching to USB 3.0, as WayneWWW already mentioned.
Hi,
If the USB port on carrier board is USB2, total bandwidth is 480Mbps. On our developer kit(Orin NX module + Orin Nano carrier board), the type-C port is USB3 so it can achieve 5Gbps.
USB can not be the bottleneck during massflash. Its not possible to even reach full USB2 bandwidth when using 2 or more systems because of the NFS limitation of 29 MB/s read speed as already described in my initial message.
An update to this topic. As it turns out after some more benchmarks there is a another bottleneck.
After burning the USB3 fuse, I am able to achieve NFS read speeds up to ~70 MB/s. Oddly enough the amount of time that it takes to flash USB2 and USB3 still stays the same (When flashing only one system).
Some benchmarks for the actual read/write process of the extdev and qspi revealed that the extdev flashing time drastically reduces when using USB3. However the qspi takes at least ~2:56 minutes to erase (with writes to a total of ~3:33 minutes).
on USB2 and USB3
[ 206]: l4t_flash_from_kernel: QSPI erased (/dev/mtd0)
real 2m56.275s
user 0m0.004s
sys 1m41.323s
[ 243]: l4t_flash_from_kernel: QSPI write
real 0m36.854s
user 0m1.969s
sys 0m23.411s
USB3
Flashing Time (flash only, L4T 36.4.3): 4 min 48 Sec
Flashing Time (massflash 2 Systems, L4T 36.4.3): 5 min 4 sec
[ 82]: l4t_flash_from_kernel: Successfully flash the external device
real 0m52.443s
user 0m16.264s
sys 0m27.817s
[ 249]: l4t_flash_from_kernel: Successfully flash the qspi
real 3m40.065s
user 0m1.949s
sys 1m58.626s
USB2
Flashing Time (flash only, L4T 36.4.3): 4 min 44 Sec
[ 127]: l4t_flash_from_kernel: Successfully flash the external device
real 1m37.700s
user 0m16.129s
sys 0m28.409s
[ 248]: l4t_flash_from_kernel: Successfully flash the qspi
real 3m38.724s
user 0m1.943s
sys 2m2.828s
So we can conclude that its not possible to get a flashing process faster than ~4:40 minutes (when using --flash-only).
Then there is still the question, why the NFS is capable to deliver ~70 MB/s with USB3 but only ~29 MB/s with USB2?
Yes and somewhat no. The actual flash process for each system is still ~4:40 during massflash (The 2nd device just started to connect vie ssh a bit later resulting to about 5 min). Given enough devices the flashing process will eventually take longer again due to the shared 70MB/s on the NFS side during extdev (To be exact, when the extdev flashing process takes longer than the one for QSPI)
Still leaves the question open why NFS does not scale very well. The cap is ~70MB/s for USB3 and ~29MB/s for USB2.