Issue to install pytorch on Jetson Orin platform

I am trying to install Pytorch on my Jetson Orin platform:

  • my Jetpack version is 5.1, cuda version is 11.4

I am following instructions here from Nvidia: Installing PyTorch for Jetson Platform - NVIDIA Docs

after installation of all required prerequisites, but when I selected the “right” wheel and use this command to install:
pip3 install --no-cache https://developer.download.nvidia.com/compute/redist/jp/v51/pytorch/torch-1.14.0a0+44dac51c.nv23.02-cp38-cp38-linux_aarch64.whl

The installation failed with error:
ERROR: torch-1.14.0a0+44dac51c.nv23.02-cp38-cp38-linux_aarch64.whl is not a supported wheel on this platform

I tried all other 2 whl there, similar error messages happened.

after trying with pip or conda to install aiohttp, scipy, numpy, protobuf etc. finally I can install the same https://developer.download.nvidia.com/compute/redist/jp/v51/pytorch/torch-1.14.0a0+44dac51c.nv23.02-cp38-cp38-linux_aarch64.whl without error.

sudo pip install --no-cache https://developer.download.nvidia.com/compute/redist/jp/v51/pytorch/torch-1.14.0a0+44dac51c.nv23.01-cp38-cp38-linux_aarch64.whl
229 python

it seems to me the torch is installed correctly after some testing, but I don’t know why tensor.cuda.is_available() returns “False”, even I installed the JetPack 5.1 multiple times with cuda support ON.

My deviceQuery returns:
Detected 1 CUDA Capable device(s)

Device 0: “Orin”
CUDA Driver Version / Runtime Version 11.4 / 11.4
CUDA Capability Major/Minor version number: 8.7
Total amount of global memory: 30588 MBytes (32074158080 bytes)
(016) Multiprocessors, (128) CUDA Cores/MP: 2048 CUDA Cores
GPU Max Clock rate: 1300 MHz (1.30 GHz)
Memory Clock rate: 1300 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 167936 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

Because I really want to fully use the Jetson Orin’s DLA cores and GPU cores for my ML workload, if my torch.cuda.is_availabe() is False, I am guessing only my CPU cores will be used, am I right here?

Hi @hank.fang.usa, I’m not sure why pip3 is giving you that error while pip isn’t - if you are using a virtualenv or conda, please try disabling it. Can you show the output from this:

pip --version
pip3 --version
which pip
which pip3

Also can you check the installed PyTorch version?

python3 -c 'import torch; print(torch.__version__)

What I would do is uninstall torch from pip/pip3 and start fresh. Until then, you can also use the l4t-pytorch or l4t-ml containers which come with PyTorch/torchvision pre-installed with GPU support.

thank you for your reply, I don’t think the version of my pip or pip3 is making any diff, I checked their version, they are the same (python 3.10 in my virtual environment), the installed torch version is 2.0.1.

Sometimes, when I try to install some package, I have to either install it from Anaconda UI, sometimes, I have to install it with conda command or pip command. It’s making too much sense, but today, I separated the long installation command into multiple steps and installed these individually first before I finally install the pytorch wheel from Nvidia link.

I am still trying to figure out why torch.cuda.is_avaiable() return False

regarding this: you can also use the l4t-pytorch or l4t-ml containers which come with PyTorch/torchvision pre-installed with GPU support.

I can run many ML models on my Jetson Orin, but all my DLA cores are OFFLINE, I don’t have trouble to see activities on CPU and GPU cores. Because the whole purpose for me to use Jetson Orin is to use the most efficient DLA engines. Is it possible here: since the Jetson Orin is powerful enough to use just CPU/GPU cores to handle my workload, no need for DLA0/DLA1 to kick in? can you pls. confirm?

@hank.fang.usa those PyTorch wheels from https://developer.download.nvidia.com/compute/redist/jp/v51/pytorch are built for Python 3.8 (we build the wheels for the default version of Python that comes with Ubuntu), so that explains the “this is not a supported wheel on this platform” error that you were getting. If you disable your virtualenv and go back to Python 3.8, those wheels should install. The version of PyTorch 2.0.1 that you have installed now for Python 3.10 probably came from PyPi and wasn’t build with GPU support.

In order to utilize the DLA cores you need to use TensorRT - PyTorch on it’s own isn’t normally able to use them. You can run PyTorch models on DLA through TensorRT with the torch2trt tool or by exporting them to ONNX:

I see, it makes sensor now. Thank you @dusty_nv

new update on this:
step 1: I used python 3.8 and re-installed the python3 -m pip install --no-cache https://developer.download.nvidia.com/compute/redist/jp/v51/pytorch/torch-2.0.0a0+8aa34602.nv23.03-cp38-cp38-linux_aarch64.whl

step 2: I checked torch.cuda.is_avaiable() in Python3, and it returns True! (great)

step 3: but when I run “python3 train.py model_bn --checkpoint_path=data/model_bn.pth” (following the tutorial GitHub - NVIDIA-AI-IOT/jetson_dla_tutorial: A tutorial for getting started with the Deep Learning Accelerator (DLA) on NVIDIA Jetson), it says “AssertionError: Torch not compiled with CUDA enabled”

complete log of step 3:
/home/mmca/anaconda3/envs/py_3816/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:63: UserWarning: Failed to initialize NumPy: module compiled against API version 0x10 but this version of numpy is 0xd . Check the section C-API incompatibility at the Troubleshooting ImportError section at Troubleshooting — NumPy v1.25.dev0 Manual for indications on how to solve this problem . (Triggered internally at /root/pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(“cpu”),
Traceback (most recent call last):
File “train.py”, line 30, in
model = MODELSargs.model_name.cuda()
File “/home/mmca/anaconda3/envs/py_3816/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File “/home/mmca/anaconda3/envs/py_3816/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 797, in _apply
module._apply(fn)
File “/home/mmca/anaconda3/envs/py_3816/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 797, in _apply
module._apply(fn)
File “/home/mmca/anaconda3/envs/py_3816/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 820, in _apply
param_applied = fn(param)
File “/home/mmca/anaconda3/envs/py_3816/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 905, in
return self._apply(lambda t: t.cuda(device))
File “/home/mmca/anaconda3/envs/py_3816/lib/python3.8/site-packages/torch/cuda/init.py”, line 239, in _lazy_init
raise AssertionError(“Torch not compiled with CUDA enabled”)
AssertionError: Torch not compiled with CUDA enabled

How can this happen? (the Torch wheel provided by Nvidia is not compiled with CUDA enabled ???), this exact link is in my step 1.

using another wheel and it works…, torch-1.14.0a0+44dac51c.nv23.01-cp38-cp38-linux_aarch64.whl

OK, glad you got it working with the other wheel - my guess is that it was related to your conda environment and another torchvision or PyTorch (one without CUDA) being installed in there.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.