I’ve tried to download the wheel file from developed.nvidia from page of jetson, while torch-2.1.0 doesn’t support torch.distributed and torch-1.11.0 doesn’t support the nccl backend. How can I do to use a torch with “distributed” package supporting nccl backend using GPU on jetson device?
Also I’ve seen some response saying that nccl backend is not supported on the jetson platform. Does it hold for all versions of cuda, pytorch jetpack? It’s better to have a pre-compiled wheel of torch support that. If not, is it possible to build a torch wheel file on my own to achieve that?
I’ve tried the jp v512 pre-built wheel of pytorch-2.1.0, however torch.distributed is not supported on the jetson devices.
I want to test the distributed inference on the multi-device system using gpu, while the introduction in the homepage of torch.distributed says that comm backend for gpu is nccl, while the availability using mpi backend for gpu is left with a notation “?”.
Also, since the network comm is mainly bounded by the network bandwidth, maybe mpi and nccl has similar performance for distributed communication. Actually mpi backend can be ok for comm backend using gpu, but I wonder whether it is exactly available for distributed comm between GPUs, and the performance gap between them.
Hi AsstaLLL:
Now I’ve seen that issue. So for the pre-built wheel pytorch, only the related torch wheels with related jetpack version after 6.1 have enabled the nccl backend on the GPU? Thanks for your one more reply!
RuntimeError: CUDA tensor detected and the MPI used doesn’t have CUDA-aware MPI support
I want to ask, what is the easiest approach to support distributed communication for CUDA tensors? As mentioned before, do you mean NCCL can be available for jp6.1 or 6.2?
And for current system (jp 5.1.2 and cuda11.4 of jetson orin nano), is it the only way to support communication of CUDA tensors is by moving them to cpu then transmit to other device and reload to gpu?
Greatly thanks for your reply!!!
I haven’t build OpenMPI with CUDA support. Do you mean [OpenMPI with CUDA support] should be build in addition as shown in the .sh file?
And, with the CUDA-supported OpenMPI, transfer of CUDA tensors with MPI can be supported? Can it work with jetpack v5.1.2?
I will try to build it.
Thanks!
We have enabled the torch.distributed flag and the backend uses MPI.
It seems to work correctly since we didn’t get any report about the functionality.
But we are not sure if our users transmit tensor on CPU or GPU.
As shared above, the CUDA support in OpenMPI can also work well.
So it’s recommended to give it a try to see if the GPU transmitting of torch.distributed can work.
Does it mean that the MPI with CUDA awared is successfully built and installed?
Besides, in the jetson device, the pytorch-1.11 with torch.distributed enabled is installed within a virtual environment (python-venv). Is the MPI backend with CUDA aware is accessible now? Or what else should I do to support it after the installation of build_openMPI.sh?
Thanks!
I still have problem when building the cuda-awared opeMPI, executing build_openMPI.sh always leads to the mpi_with_cuda_support as False
When I try to execute the building script with “–with-cuda-libdir=/usr/lib/aarch64-linux-gnu” option added just like what was done in:
But output showed “configure: WARNING: unrecognized options: --with-cuda-libdir”. Can you offer some help? Thanks for your time.
Hi,
I didn’t see any explicit error output, and openMPI can be successfully built, just with “mpi_with_cuda_support: False”.
Here is the full output for running the script, with the option “–with-cuda-libdir=/usr/lib/aarch64-linux-gnu”, I don’t whether this can be helpful. build.log (467.0 KB)
Thanks.
Hi,
the verification info is shown below: nvidia@tegra-ubuntu:~$ ompi_info --parsable --all | grep mpi_built_with_cuda_support mca:opal:base:param:opal_built_with_cuda_support:synonym:name:mpi_built_with_cuda_support mca:mpi:base:param:mpi_built_with_cuda_support:value:false mca:mpi:base:param:mpi_built_with_cuda_support:source:default mca:mpi:base:param:mpi_built_with_cuda_support:status:read-only mca:mpi:base:param:mpi_built_with_cuda_support:level:4 mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU buffer support is built into library or not mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:0:false mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:1:true mca:mpi:base:param:mpi_built_with_cuda_support:deprecated:no mca:mpi:base:param:mpi_built_with_cuda_support:type:bool mca:mpi:base:param:mpi_built_with_cuda_support:synonym_of:name:opal_built_with_cuda_support mca:mpi:base:param:mpi_built_with_cuda_support:disabled:false nvidia@tegra-ubuntu:~$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value mca:mpi:base:param:mpi_built_with_cuda_support:value:false
Thanks