machine:
2 *Jetson AGX Orin 64GB
environment:
Jetpack 5.1.1
Python 3.8.10
NCCL 2.11.4+cuda11.4
Pytorch v1.11.0
The pytorch i used is provided by NVIDIA;
PyTorch for Jetson
I try to build a distributed development environment based on AGX Orin, and communicate using nccl.
I’ve tried version 2.1 of pytorch in the past, But it doesn’t seem to provide a distribution module.
#pytorch v2.1.0
>>> import pytorch
>>> torch.distributed.is_available( )
False
Then i switched the version to v1.11.0, but i met the following problem:
#pytorch v1.11.0
>>> import pytorch
>>> torch.distributed.is_available( )
True
>>> torch.distributed.is_nccl_available()
False
>>> torch.cuda.nccl.is_available(torch.randn(1).cuda())
/usr/local/lib/python3.8/dist-packages/torch/cuda/nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn('PyTorch is not compiled with NCCL support')
False
I want to know dose the orin support NCCL? And how to solve the problem of use NCCL?Thanks!