Model Training on NX

Not sure if these are silly questions or not, but would it technically be possible to link say 10 Xavier NXs together to almost act as one more powerful GPU? It would be cool to have distributed NX nodes that could be used to train a computer vision dataset, rather than one single (point of failure) powerful computer.

Failing that, could an individual NX take an existing working yolov4 model and use transfer learning to potentially improve this model incrementally with say 100 new annotated images captured per day? Ie during 12 hours downtime overnight, the NX uses this time to become more accurate with more training data.

I have seen some applications that support distributed training with multiple GPU instances in one machine. You think of distributing training workload to multiple machines, which is much harder and highly depends on your use-case and model you want to train. It will definitely require custom application code.

Thanks @dkreutz. The use case would be yolov4 darknet object dection to detect anomalies in manufactured products.


To use multiple GPU devices, one of the bottleneck is IO speed for sharing the intermediate data.
For desktop GPU, we do have several techniques like NVLINK to improve the bandwidth.
However, these techniques may not all be available for XavierNX since it is originally designed for inference.

And sorry that we don’t have the profiling data for training YOLOv4 on XavierNX.
But based on the table below, XavierNX can reach 618fps on YOLOv3 Tiny.
So you can roughly know the expected training time for YOLOv4 based on the model different, epoch, and training data size.


Thanks @AastaLLL so based on the yolov3 benchmark, can you give an example formula for calculating the estimated training time please?

Ok, rather than distributed Xavier nx, how can I combine say 2, 4 or 8 in one location?


For example, based on the analysis here: YOLO: Real-Time Object Detection
You can roughly find the performance between YOLOv3 and YOLOv3 Tiny is 1:6.

And based on Jetson benchmarks, YOLOv3 Tiny achieves 607 fps on the XavierNX.
So you may approximately get 100 fps for YOLOv3 on the same device.

We don’t have a similar comparison between YOLOv4 and YOLOv3.
You can try to find one from the author or the community.

For a standard XavierNX, you will need a local network to share data between Jetsons.