Jetson AGX Orin using torch.distributed

s1wnd0702 · November 12, 2024, 7:48am

Hi, I’m using a jetson AGX Orin device and I have torch==2.0.0+nv23.05 installed and using the 11.4 version of cuda. I’m using this version because it fits my device just right and uses the GPU on my device. I’m now trying to use the RPC in Torch to communicate between devices, but I’m getting back a torch.distributed.is_available() with false. I want to continue using it under the current torch version, is there any way to fix this bug? Any suggestions will be help!

carolyuu · November 12, 2024, 8:00am

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

TensorFlow: https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html
PyTorch: Installing PyTorch for Jetson Platform - NVIDIA Docs
We also have containers that have frameworks preinstalled:
Data Science, Machine Learning, AI, HPC Containers | NVIDIA NGC

3. Tutorial

Startup deep learning tutorial:

Jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson
TensorRT sample: Jetson/L4T/TRT Customized Example - eLinux.org

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

s1wnd0702 · November 12, 2024, 8:19am

import os
import torch
import torch.distributed.rpc as rpc
import sys

os.environ[‘MASTER_ADDR’] = ‘192.168.1.101’
os.environ[‘MASTER_PORT’] = ‘29500’

def double_result_on_device_b(x):
return x * 2

if name == “main”:
device = sys.argv[1]
rank = 0 if device == “a” else 1

rpc.init_rpc(
    device, 
    rank=rank, 
    world_size=2, 
    rpc_backend_options=rpc.TensorPipeRpcBackendOptions()
)

if device == "a":
    a = 3
    b = 4
    result = a + b
    fut = rpc.rpc_async("deviceB", double_result_on_device_b, args=(result,))
    print(f"answer:{fut.wait()}")
rpc.shutdown()

Here’s my rpc code, I used Jetson AGX Orin 64GB RAM and Jetson Xavier NX 16GB RAM for the experiment. I used this code to run Python rpc.py A and Python rpc.py B instructions on both devices, and the result was the following error:
Traceback (most recent call last):
File “rpc.py”, line 18, in
rpc.init_rpc(
AttributeError: module ‘torch.distributed.rpc’ has no attribute ‘init_rpc’

AastaLLL · November 21, 2024, 6:31am

Hi,

Could you try if the module exists in the package listed in the below link:
http://jetson.webredirect.org/jp6/cu126

Thanks.

system · December 18, 2024, 3:57am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Jetson agx orin torch 1.11.0 install Jetson AGX Orin pytorch	5	777	November 6, 2023
Torch.distributed.pipelining can't be using Jetson Orin NX jetson-inference , pytorch	4	90	November 11, 2024
Torch.distributed is not available on Jetson PyTorch Jetson Orin NX pytorch	3	2906	July 12, 2023
PyTorch GPU Not Detected on Jetson AGX Orin Jetson AGX Orin pytorch	4	82	March 31, 2025
In PyTorch 6.1, the GPU is not working Jetson AGX Orin pytorch	5	194	December 4, 2024
Jetson AGX Orin ,pytorch, torchvision Jetson AGX Orin pytorch	10	2145	October 11, 2022
Can someone tell me how to quantize using pytorch on a jetson device? Jetson AGX Orin pytorch	6	40	March 5, 2025
Pytorch3d installed in the jetson agx orin 64G Jetson AGX Orin pytorch	2	66	November 6, 2024
Module 'torch.distributed' has no attribute 'ReduceOp' Jetson AGX Orin pytorch	5	3552	June 14, 2023
Jetson orin nx Jetson Orin NX pytorch	2	34	December 23, 2024

Jetson AGX Orin using torch.distributed

1. Performance

2. Installation

3. Tutorial

4. Report issue

Related topics