Pytorch does not support NCCL

machine:

2 *Jetson AGX Orin 64GB

environment:

Jetpack   5.1.1
Python    3.8.10
NCCL      2.11.4+cuda11.4
Pytorch   v1.11.0

The pytorch i used is provided by NVIDIA;
PyTorch for Jetson
I try to build a distributed development environment based on AGX Orin, and communicate using nccl.
I’ve tried version 2.1 of pytorch in the past, But it doesn’t seem to provide a distribution module.

#pytorch v2.1.0 
>>> import pytorch
>>> torch.distributed.is_available( )
False

Then i switched the version to v1.11.0, but i met the following problem:

#pytorch v1.11.0 
>>> import pytorch
>>> torch.distributed.is_available( )
True
>>> torch.distributed.is_nccl_available()
False
>>> torch.cuda.nccl.is_available(torch.randn(1).cuda())
/usr/local/lib/python3.8/dist-packages/torch/cuda/nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
  warnings.warn('PyTorch is not compiled with NCCL support')
False

I want to know dose the orin support NCCL? And how to solve the problem of use NCCL?Thanks!

Hi @whoops, NCCL is not supported on the Jetson platform. You can built PyTorch with USE_DISTRIBUTED enabled, and it will use MPI instead of NCCL.

thank you for quick reply @dusty_nv .I want to know if Jetson platform will support NCCL in the future?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.