I am working on data parallel with pyTorch, but I got this error:
from torch.distributed import init_process_group, destroy_process_group
ImportError: cannot import name ‘init_process_group’ from ‘torch.distributed’
Is the pyTorch for Jetson not the same as the pyTorch?
I can run my program on my Linux Desktop, but got this error when I ran on Jetson.
Hi @hlau2, that PyTorch wheel for Jetson wasn’t built with USE_DISTRIBUTED enabled, so it doesn’t have torch.distributed available. You can either disable the code, or rebuild PyTorch with USE_DISTRIBUTED.
You can find instructions on building PyTorch from source in this topic:
I cloned the pyTorch v2.1.0, and then installed the packages, and
export USE_NCCL=0
export USE_DISTRIBUTED=1
export USE_QNNPACK=0
export USE_PYTORCH_QNNPACK=0
export TORCH_CUDA_ARCH_LIST=“7.2;8.7”
export PYTORCH_BUILD_VERSION=2.1.0
export PYTORCH_BUILD_NUMBER=1
and then python3 setup.py bdist_wheel
it tooks few hours and stuck, because of memory. is it correct?
UPDATE: It actually dead.
g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
[413/1605] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu.o
ninja: build stopped: subcommand failed.
@hlau2 I haven’t used distributed mode, but I would check torch.__config__.show() and the PyTorch source to see what torch.distributed.is_available() is checking for.