Any PyTorch versions supporting torch.distributed and nccl backend on jetson orin nano?

I’ve tried to download the wheel file from developed.nvidia from page of jetson, while torch-2.1.0 doesn’t support torch.distributed and torch-1.11.0 doesn’t support the nccl backend. How can I do to use a torch with “distributed” package supporting nccl backend using GPU on jetson device?

Also I’ve seen some response saying that nccl backend is not supported on the jetson platform. Does it hold for all versions of cuda, pytorch jetpack? It’s better to have a pre-compiled wheel of torch support that. If not, is it possible to build a torch wheel file on my own to achieve that?

NCCL is for multi GPU systems. May I know your use case?
The precompiled whl can be find at Index of /compute/redist/jp

I’ve tried the jp v512 pre-built wheel of pytorch-2.1.0, however torch.distributed is not supported on the jetson devices.
I want to test the distributed inference on the multi-device system using gpu, while the introduction in the homepage of torch.distributed says that comm backend for gpu is nccl, while the availability using mpi backend for gpu is left with a notation “?”.
Also, since the network comm is mainly bounded by the network bandwidth, maybe mpi and nccl has similar performance for distributed communication. Actually mpi backend can be ok for comm backend using gpu, but I wonder whether it is exactly available for distributed comm between GPUs, and the performance gap between them.

Hi,

Please reflash your device into JetPack 6.1 or 6.2.
Then download and install the package in the below link:
https://pypi.jetson-ai-lab.dev/jp6/cu126

Below is a related discussion for your reference:

Thanks.

Hi AsstaLLL:
Now I’ve seen that issue. So for the pre-built wheel pytorch, only the related torch wheels with related jetpack version after 6.1 have enabled the nccl backend on the GPU? Thanks for your one more reply!

Hi,

PyTorch distributed is enabled but using MPI instead of NCCL.
Please find below for the further info:

Thanks.

I’ve tried wheel torch-1.11 from PyTorch for Jetson - Jetson & Embedded Systems / Announcements - NVIDIA Developer Forums on jetson orin nano, indeed torch.cuda.is_available() is True and torch.distributed.is_mpi_available() is True, but when I transmitting tensor on GPU using torch.distributed with backend of mpi, following erroe occurred:

RuntimeError: CUDA tensor detected and the MPI used doesn’t have CUDA-aware MPI support

I want to ask, what is the easiest approach to support distributed communication for CUDA tensors? As mentioned before, do you mean NCCL can be available for jp6.1 or 6.2?
And for current system (jp 5.1.2 and cuda11.4 of jetson orin nano), is it the only way to support communication of CUDA tensors is by moving them to cpu then transmit to other device and reload to gpu?
Greatly thanks for your reply!!!

Hi,

torch.distributed uses MPI as the backend.
Based on the error, have you built OpenMPI with CUDA support?

Thanks.

I haven’t build OpenMPI with CUDA support. Do you mean [OpenMPI with CUDA support] should be build in addition as shown in the .sh file?
And, with the CUDA-supported OpenMPI, transfer of CUDA tensors with MPI can be supported? Can it work with jetpack v5.1.2?
I will try to build it.
Thanks!

Hi,

Would you mind giving it a try directly?

We have enabled the torch.distributed flag and the backend uses MPI.
It seems to work correctly since we didn’t get any report about the functionality.
But we are not sure if our users transmit tensor on CPU or GPU.

As shared above, the CUDA support in OpenMPI can also work well.
So it’s recommended to give it a try to see if the GPU transmitting of torch.distributed can work.

Thanks.

Hi,
After I have run build_openMPI.sh, it has the following output as tail:


Does it mean that the MPI with CUDA awared is successfully built and installed?
Besides, in the jetson device, the pytorch-1.11 with torch.distributed enabled is installed within a virtual environment (python-venv). Is the MPI backend with CUDA aware is accessible now? Or what else should I do to support it after the installation of build_openMPI.sh?
Thanks!

Hi,

Would you mind giving it a try?
As OpenMPI and PyTorch are third-party libraries, we don’t have a QA plan for them.

It may work as it seems that all the components can be built on Jetson.
But this is not our official release.

Thanks.

I still have problem when building the cuda-awared opeMPI, executing build_openMPI.sh always leads to the mpi_with_cuda_support as False
When I try to execute the building script with “–with-cuda-libdir=/usr/lib/aarch64-linux-gnu” option added just like what was done in:

But output showed “configure: WARNING: unrecognized options: --with-cuda-libdir”. Can you offer some help? Thanks for your time.

Hi,

I still have problem when building the cuda-awared opeMPI, executing build_openMPI.sh always leads to the mpi_with_cuda_support as False

Do you see any errors when running the script?
Or was it built successfully but the CUDA support is not enabled?

Thanks.

Hi,
I didn’t see any explicit error output, and openMPI can be successfully built, just with “mpi_with_cuda_support: False”.
Here is the full output for running the script, with the option “–with-cuda-libdir=/usr/lib/aarch64-linux-gnu”, I don’t whether this can be helpful.
build.log (467.0 KB)
Thanks.

Hi,

In your log, the cuda support is enabled:

Open MPI configuration:
-----------------------
Version: 5.1.0a1
Build MPI C bindings: yes
Build MPI Fortran bindings: no
Build MPI Java bindings (experimental): no
Build Open SHMEM support: yes
Debug build: no
Platform file: (none)

Miscellaneous
-----------------------
Atomics: C11 atomics
CUDA support: yes
...

How do you verify the CUDA support in OpenMPI?
Could you try the below command:

$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value

Thanks.

Hi,
the verification info is shown below:
nvidia@tegra-ubuntu:~$ ompi_info --parsable --all | grep mpi_built_with_cuda_support mca:opal:base:param:opal_built_with_cuda_support:synonym:name:mpi_built_with_cuda_support mca:mpi:base:param:mpi_built_with_cuda_support:value:false mca:mpi:base:param:mpi_built_with_cuda_support:source:default mca:mpi:base:param:mpi_built_with_cuda_support:status:read-only mca:mpi:base:param:mpi_built_with_cuda_support:level:4 mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU buffer support is built into library or not mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:0:false mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:1:true mca:mpi:base:param:mpi_built_with_cuda_support:deprecated:no mca:mpi:base:param:mpi_built_with_cuda_support:type:bool mca:mpi:base:param:mpi_built_with_cuda_support:synonym_of:name:opal_built_with_cuda_support mca:mpi:base:param:mpi_built_with_cuda_support:disabled:false nvidia@tegra-ubuntu:~$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value mca:mpi:base:param:mpi_built_with_cuda_support:value:false
Thanks

Hi,

We give it a try on a device with JetPack 5.1.4 for the script shared in the below comment:

It works correctly and GPU is enabled.
Could you double-check it again?

$ sudo chmod +x build_openMPI.sh 
$ ./build_openMPI.sh 
$ export CUDA_HOME="/usr/local/cuda"
$ export UCX_HOME="/usr/local/ucx"
$ export OMPI_HOME="/usr/local/ompi"
$ export PATH="${CUDA_HOME}/bin:${UCX_HOME}/bin:${OMPI_HOME}/bin:$PATH"
$ export LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${UCX_HOME}/lib64:${OMPI_HOME}/lib64:$LD_LIBRARY_PATH"
$ ompi_info --parsable --all | grep mpi_built_with_cuda_support:value
mca:mpi:base:param:mpi_built_with_cuda_support:value:true

Thanks.