Not sure if these are silly questions or not, but would it technically be possible to link say 10 Xavier NXs together to almost act as one more powerful GPU? It would be cool to have distributed NX nodes that could be used to train a computer vision dataset, rather than one single (point of failure) powerful computer.
Failing that, could an individual NX take an existing working yolov4 model and use transfer learning to potentially improve this model incrementally with say 100 new annotated images captured per day? Ie during 12 hours downtime overnight, the NX uses this time to become more accurate with more training data.
I have seen some applications that support distributed training with multiple GPU instances in one machine. You think of distributing training workload to multiple machines, which is much harder and highly depends on your use-case and model you want to train. It will definitely require custom application code.
To use multiple GPU devices, one of the bottleneck is IO speed for sharing the intermediate data.
For desktop GPU, we do have several techniques like NVLINK to improve the bandwidth.
However, these techniques may not all be available for XavierNX since it is originally designed for inference.
And sorry that we don’t have the profiling data for training YOLOv4 on XavierNX.
But based on the table below, XavierNX can reach 618fps on YOLOv3 Tiny.
So you can roughly know the expected training time for YOLOv4 based on the model different, epoch, and training data size.