Torch 2.5.0 module 'torch.distributed' has no attribute 'init_process_group' in jetpack 6.1

Issue:
I installed torch-2.5.0a0+872d972e41.nv24.8 and jetpack 6.1 in Agx orin,
found the torch lost module ‘torch.distributed’, the error is module ‘torch.distributed’ has no attribute ‘init_process_group’ as run following command in python

~$ python

Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
print(torch.version)

2.5.0a0+872d972e41.nv24.08

import torch.distributed as dist
dist.init_process_group(backend=“nccl”)

Traceback (most recent call last):
File “”, line 1, in
AttributeError: module ‘torch.distributed’ has no attribute ‘init_process_group’

Environment:
device :Agx orin
torch verison : torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl download link selected from PyTorch for JetPack 6.1 as following link: Jetson Download Center | NVIDIA Developer
jatpack: 6.1
python:3.10.12

Question:
I tried to rebuild the pytorch 2.5.0 but failed, how to get full version of pytorch 2.5.0 using jetpack6.1 ?

Hi,

Please try the PyTorch package in the below link:

http://jetson.webredirect.org/jp6/cu126

We have confirmed that the package has been built with the distributed module.

Thanks.

Hi AastaLL:

I met a new issue as installed you mentioned new torch2.5.0, please refer to following error message:

import torch
import torch.distributed as dist
dist.init_process_group(backend=“nccl”)
Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py”, line 83, in wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py”, line 97, in wrapper
func_return = func(*args, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py”, line 1520, in init_process_group
store, rank, world_size = next(rendezvous_iterator)
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/rendezvous.py”, line 258, in _env_rendezvous_handler
rank = int(_get_env_or_raise(“RANK”))
File “/usr/local/lib/python3.10/dist-packages/torch/distributed/rendezvous.py”, line 243, in _get_env_or_raise
raise _env_error(env_var)
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable RANK expected, but not set

I tried in 3 environment installing new torch2.5.0, issues are same in Agx orin device , virtual environment and docker.

Thanks for you checking this issue !

Hi AastaLL:

Above issue should not be torch issue, the torch works very well as set os parameters as following.

import os
os.environ[‘MASTER_ADDR’] = ‘localhost’
os.environ[‘MASTER_PORT’] = ‘5678’
os.environ[‘RANK’] = ‘0’
os.environ[‘WORLD_SIZE’] = ‘1’
import torch
import torch.distributed as dist
dist.init_process_group(backend=“nccl”)

Thanks a lot for your help!!!

Hi,

Good to know it works now!
Thanks for the update.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.