PyTorch for Jetson

@dusty_nv Ah, thanks a ton. I totally missed that. :)

Failed installing torchvision from source on Orin Nano with PyTorch=2.0 and torchvision=0.15.1

    raise ValueError(f"Unknown CUDA arch ({arch}) or GPU not supported")
ValueError: Unknown CUDA arch (8.7+PTX) or GPU not supported

Any suggestions on how to fix it.

Thank you,

Hi @akhilgurram.ai, please see this topic:

Thanks for sharing the link. It works.

Are there any prebuilt wheels for Python with fbgemm or qnnpack? I need to use int8 for some experiments.
I am using a Jetson nano with JP 4.6.3.

I can also try to build it myself if someone can provide a link to any documentation.

Hi @pramodhrachuri, I don’t believe there are - when building the PyTorch wheels for JetPack 4, I had disabled QNNPACK because there are compilation errors at some point in the past (and seeing as QNNPACK is CPU-only I never really dug into it). If you want to try it, the general procedure that I followed for making the wheels can be found under the Build from Source section at the top of this post.

Hi @dusty_nv
I’m working with a Xavier AGX, with Jetpack 5.1.1; R35.3.1
I’ve used the SDK Manager (1.9.2.10899) to install CUDA/CUDNN
I followed your steps to install torch (2.0.0) and torchvision (0.15.1) for that jetpack version.
All of the above installs, with no problems.

I’m running into issues on two fronts, where I am trying to install caffe from source as well as run some items with torch. Unfortunately I don’t have the caffe output handy just yet - my priority is to fix the torch environment. But with the torch environment, the general error code I’m getting is:
<class ‘RuntimeError’> cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

I installed CUDA 12.1 via:

I installed CUDNN 8.9.1.23 via:
sudo dpkg -i cudnn-local-repo-ubuntu2004-8.9.1.23_1.0-1_arm64.deb
sudo cp /var/cudnn-local-repo-ubuntu2004-8.9.1.23/cudnn-local-828249D0-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get install libcudnn8=8.9.1.23-1+cuda12.1
sudo apt-get install libcudnn8-dev=8.9.1.23-1+cuda12.1
sudo apt-get install libcudnn8-samples=8.9.1.23-1+cuda12.1

In my .bashrc, for CUDA I have added:

export LD_LIBRARY_PATH="/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH"
export CUDA_HOME="/usr/local/cuda-12.1"
export PATH="/usr/local/cuda-12.1/bin:$PATH"
export LD_PRELOAD="/usr/lib/aarch64-linux-gnu/libgomp.so.1"

I run your verification script, and everything is OK.
The code that triggers the error is:

#!/usr/bin/env python3
import torch
import torchvision

print(torch.__version__)
print('CUDA available: ' + str(torch.cuda.is_available()))
print('cuDNN version: ' + str(torch.backends.cudnn.version()))
a = torch.cuda.FloatTensor(2).zero_()
print('Tensor a = ' + str(a))
b = torch.randn(2).cuda()
print('Tensor b = ' + str(b))
c = a + b
print('Tensor c = ' + str(c))
print(torchvision.__version__)

torch.cuda.empty_cache()
device = torch.device('cuda')
torch.nn.functional.conv2d(torch.zeros(32, 32, 32, 32, device=device), torch.zeros(32, 32, 32, 32, device=device))

print("Success!")

I have been snooping around in /usr/local/cuda-12.1/lib64 and include, and don’t see the cudnn files that I would expect (no cudnn* or libcudnn*; if I follow the tar installation steps I know these get copied in, but I don’t know where they live via a .deb installation)

Relevant Stack Trace:

Traceback (most recent call last):
  File "./cuda_torch_test.py", line 18, in <module>
    torch.nn.functional.conv2d(torch.zeros(32, 32, 32, 32, device=device), torch.zeros(32, 32, 32, 32, device=device))
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Do you have any advice?

Thank you

@claxtono these PyTorch wheels were built against the default version of CUDA/cuDNN that comes with JetPack, so you would need to recompile PyTorch if you install a different major version of CUDA/cuDNN. I would recommend just sticking with the default version that SDK Manager installed.

BTW, cuDNN typically gets installed under /usr/lib/aarch64-linux-gnu and /usr/include/aarch64-linux-gnu but I don’t know about the custom-upgraded ones.

1 Like

Hi Dusty,

I reverted my CUDA distribution to 11.4 as packaged with jetpack.
I am still experiencing the same error code in the same place. Is there something else I can check?

Thanks

@claxtono are you able to run my test_pytorch.py script from here?

https://github.com/dusty-nv/jetson-containers/blob/master/test/test_pytorch.py

It does some basic checks of cuDNN not dissimilar to yours, but I’m curious if there’s some difference. Otherwise, it’s hard to say what the environment of your system is like after installing/uninstalling other versions of CUDA, so I might recommend reflashing it or using the l4t-pytorch container which already has working versions of those components pre-installed.

Hi Dusty,

Thanks for that. I went the whole distance and reflashed (RIP), but great results. For future reference (Note: it may be worth adding some of these dependencies i.e. libopenblas.dev to the steps)
After the reflash my steps were:
On each reboot, because I am working in an enterprise wireless network:

sudo date -s "$(wget -qSO- --max-redirect=0 google.com 2>&1 | grep Date: | cut -d' ' -f5-8)Z"

Prep system:

sudo apt update
sudo apt upgrade
sudo unminimize
sudo apt-get install python3-pip

Add to ~/.bashrc:

export PYTHONPATH="$HOME/.local/bin:$PYTHONPATH"
export PATH="$HOME/.local/bin:$PATH"

Install torch (assumes user has downloaded file to current directory):

pip3 install ./torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl

Install torchvision:

git clone --branch v0.15.1 https://github.com/pytorch/vision torchvision
cd torchvision
sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev
sudo apt-get install libopenblas-dev # Additional torchvision dependency
export BUILD_VERSION=0.15.1
python3 setup.py install --user

To run the test script test_pytorch.py:

pip3 install packaging

Outcome: Successful execution of both your verification scripts plus my test script.
I will now resume my efforts to set up caffe. Thanks!!

Hi.
I’m having troubles with torch on my Jetson Orin Nano
When I try to import it in python the interpreter crashes.
I tried to run it with the fault handler here is the output:

python3 -q -X faulthandler

>>> import torch

Fatal Python error: Segmentation fault

Current thread 0x0000ffff8411c010 (most recent call first):
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 1166 in create_module
  File "<frozen importlib._bootstrap>", line 556 in module_from_spec
  File "<frozen importlib._bootstrap>", line 657 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "/home/romain/.local/lib/python3.8/site-packages/torch/__init__.py", line 229 in <module>
  File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 848 in exec_module
  File "<frozen importlib._bootstrap>", line 671 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 975 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 991 in _find_and_load
  File "<stdin>", line 1 in <module>

Segmentation fault (core dumped)

I’m using:
Ubuntu 20.04
Jetpack 5.1.1
python 3.8
torch 2.0.0
torchvision 0.15.1

I’ve installed both torch and torchvision according to the original post and already tried:

export OPENBLAS_CORETYPE=ARMV8

It’s been days since I got stuck here so any help would be welcome.
Thanks a lot !

@rom.boutet0 see my reply from your other post:

I am trying to install torch==1.11.0+cu113 and torchvision==0.12.0+cu113 using pytorch pip wheels on Jetson Nano for a conda environment. However, when using JetPack5 and trying the following command for installation, I get the following error message:

pip3 install ./Downloads/torch-1.11.0-cp38-cp38-linux-aarch64.whl

ERROR: torch-1.11.0-cp38-cp38-linux_aarch64.whl is not a supported wheel on this platform.
Do you know how I can resolve this issue?

Also, I have tried installing them without Jetson Pack using the following:

conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch

and still it did not work. I was wondering if you can help me with this?

Thanks,

Hi @vahdat.melika the PyTorch wheels that were built for Jetson you can find at the top of this post or here. Have you tried installing one of those?

Where did you download this wheel from and what does pip3 --version show for you?

Hi @dusty_nv,

Yes, I have tried the PyTorch wheels built for Jetson from the first post at the top of this forum. I have installed this wheel from the first post in this forum (https://nvidia.box.com/shared/static/ssf2v7pf5i245fk4i0q926hy4imzs2ph.whl) (section Jetpack5 Pytorchv1.11.0). Here is what pip3 --version shows for me:

pip 23.0.1 from /home/agas01/miniconda3/envs/scorp_n/lib/python3.8/site-packages/pip (python 3.8)

Thank you,

If you disable conda, does it install okay then? I’m not sure why, but some other uses have reported similar messages about virtualenv and conda.

I had to format the SD card and install everything from the beginning because my Jetson Orin Nano was not reading the keyboard anymore. Therefore, I started following this again. After following the steps to install JetPack on my Jetson Orin Nano, I got this error:

E: The repository 'https://repo.download.nvidia.com/jetson/common/orin-nano r35.3 Release' does not have a Release file.

This error does not allow me to install JetPack and continue with pytorch installation. I had encountered the same issue before formatting the SD card as well. I was wondering if this could be the source of the problem why pytorch wheel is not compatible and I was wondering if there is any solution to this issue?
I appreciate your time!
Thanks,

Hi @vahdat.melika, the SD card image for Jetson Orin Nano Developer Kit should already come with the JetPack components pre-installed (including CUDA Toolkit, cuDNN, TensorRT, OpenCV, VPI, ect). So you should have to install much before PyTorch, other than what is in the Installation section at the top of this post.

On a fresh SD card, are you able to do sudo apt-get update ?

You can also try changing your /etc/apt/sources.list.d/nvidia-l4t-apt-source.list to reflect the following:

deb https://repo.download.nvidia.com/jetson/common r35.3 main
deb https://repo.download.nvidia.com/jetson/t234 r35.3 main

If you continue having issues get setup, I recommend trying the l4t-pytorch container which comes with the PyTorch/torchvision packages pre-installed.

Hi @dusty_nv ,

Thank you for your quick response. I will try changing the /etc/apt/sources.list.d/nvidia-l4t-apt-source.list and the l4t-pytorch container. When I do sudo apt-get update I get error:

Err:8 https://repo.download.nvidia.com/jetson/common/orin-nano r35.3 Release   
  404  Not Found [IP: 184.150.70.89 443]
Reading package lists... Done
E: The repository 'https://repo.download.nvidia.com/jetson/common/orin-nano r35.3 Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.