I am trying to distribute deep learning models on two Xavier Jetson Tx2 devices and two other GPUs (GeForce1070, Tesla2075) for experimental purpose.
My experience of distributing Tensorflow-gpu model among these four heterogeneous GPU devices connected in a LAN configuration is neither easy, nor a success so far, possibly due to configurations and communication issues.
Following web tutorials related to Distributed Tensorflow-gpu, for asynchronous gradient descent, simple MNIST model halts as soon as 3rd worker tries to synchronize its gradients with chief worker over gRPC.
I am looking for either a more suitable communication library to tie up GPUs, OR, shifting to PyTorch or other framework at all.
So, my question is:
Does Xavier AGX support any communication mechanism for such heterogeneous GPUs setup, other than Tensorflow GRPC? (e.g. NCCL, openMPI, Gloo, …)
OR
If anybody have experience in such heterogeneous distribution then please suggest you have to make this distribution running looking at following experimental configurations?
Configurations
Python: 3.5
OS: Ubuntu 16.04
Cuda: 10.0 (Geforce/Tx2), 7.5 (Tesla)
Tensorflow-gpu: 1.9 (Geforce/Tesla), 1.10 (Xavier)