Distributing machine learning model on Jetson Tx2 AGX Xavier

I am trying to distribute deep learning models on two Xavier Jetson Tx2 devices and two other GPUs (GeForce1070, Tesla2075) for experimental purpose.

My experience of distributing Tensorflow-gpu model among these four heterogeneous GPU devices connected in a LAN configuration is neither easy, nor a success so far, possibly due to configurations and communication issues.

Following web tutorials related to Distributed Tensorflow-gpu, for asynchronous gradient descent, simple MNIST model halts as soon as 3rd worker tries to synchronize its gradients with chief worker over gRPC.

I am looking for either a more suitable communication library to tie up GPUs, OR, shifting to PyTorch or other framework at all.

So, my question is:
Does Xavier AGX support any communication mechanism for such heterogeneous GPUs setup, other than Tensorflow GRPC? (e.g. NCCL, openMPI, Gloo, …)

If anybody have experience in such heterogeneous distribution then please suggest you have to make this distribution running looking at following experimental configurations?


Python: 3.5
OS: Ubuntu 16.04
Cuda: 10.0 (Geforce/Tx2), 7.5 (Tesla)
Tensorflow-gpu: 1.9 (Geforce/Tesla), 1.10 (Xavier)


This issue can better be solved on the Xavier platform.

Xavier has NVLINK to connect other iGPU(ex. Jetson) platform.
You can also link some dGPU (ex. Tesla card) with NVSWITCH.

Check this introduction for more information:


Thank you, but For my use case, I intend to distribute the GPU nodes across LAN/WAN.
Therefore, NVLINK does not seem to be my option.
In fact I am looking for kinda extended version of this cluster [http://selkie-macalester.org/csinparallel/modules/RosieCluster/build/html/]. (this guy has done splendid job but I have some queries to implement and take it further)


Maybe this also can help: