PyTorch for Jetson

@vahdat.melika my JetPack 5.1.1 / L4T R35.3.1 install doesn’t have that specific orin-nano line in the apt sources, mine looks like above (although it’s possible I’m running a slightly different build here)

@dusty_nv , my JetPack apt sources also does not have any orin-nano either. This is what I have in my apt sources:

deb r35.3 main

I do not have that common line one.

Hmm sorry, I’m not sure where that is coming from then 🤔
grep -r -i "orin-nano" /etc doesn’t turn anything up on my system… (an Orin Nano devkit running JetPack 5.1.1 SD card image)

No worries! I will try the container and see what happens. When I do grep -r -i "orin-nano" / etc I get bunch of permission denied. Also, I tried adding that first line having common to my apt sources list (deb r35.3 main) using nano editor and it does not allow me to save the changes (I get permission denied again).

@dusty_nv I do not think the JetPack toolkit is installed for me, because when I wanted to get its version to match it with the right tag for the container, I did not get any version ( when doing dpkg -l | grep nvidia-jetpack it does not display anything and using the nvidia-jetpack it does not recognize it in the terminal). I feel like some files were not installed well on the SD card. This could have been because when flashing the image, I was using my company network which is a restricted one. I will reformat the SD card again but this time, I will use a none-restricted network.

You should be able to save your changes if you edit it with sudo (i.e. by running sudo nano /etc/apt/sources.list.d/nvidia-l4t-apt-source.list)

That also shows nothing for me, I think because nvidia-jetpack is a metapackage from apt and not an actual debian package. However, dpkg -l | grep nvidia shows this for me:

 dpkg -l | grep nvidia
ii  libnvidia-container-tools                   1.10.0-1                             arm64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container0:arm64                  0.11.0+jetpack                       arm64        NVIDIA container runtime library
ii  libnvidia-container1:arm64                  1.10.0-1                             arm64        NVIDIA container runtime library
ii  nvidia-container-runtime                    3.9.0-1                              all          NVIDIA container runtime
ii  nvidia-container-toolkit                    1.11.0~rc.1-1                        arm64        NVIDIA Container toolkit
ii  nvidia-docker2                              2.11.0-1                             all          nvidia-docker CLI wrapper
ii  nvidia-l4t-3d-core                          35.3.1-20230315004708                arm64        NVIDIA GL EGL Package
ii  nvidia-l4t-apt-source                       35.3.1-20230315004708                arm64        NVIDIA L4T apt source list debian package
ii  nvidia-l4t-bootloader                       35.3.1-20230315004708                arm64        NVIDIA Bootloader Package
ii  nvidia-l4t-camera                           35.3.1-20230315004708                arm64        NVIDIA Camera Package
ii  nvidia-l4t-configs                          35.3.1-20230315004708                arm64        NVIDIA configs debian package
ii  nvidia-l4t-core                             35.3.1-20230315004708                arm64        NVIDIA Core Package
ii  nvidia-l4t-cuda                             35.3.1-20230315004708                arm64        NVIDIA CUDA Package
ii  nvidia-l4t-display-kernel                   5.10.104-tegra-35.3.1-20230315004708 arm64        NVIDIA Display Kernel Modules Package
ii  nvidia-l4t-firmware                         35.3.1-20230315004708                arm64        NVIDIA Firmware Package
ii  nvidia-l4t-gbm                              35.3.1-20230315004708                arm64        NVIDIA GBM Package
ii  nvidia-l4t-graphics-demos                   35.3.1-20230315004708                arm64        NVIDIA graphics demo applications
ii  nvidia-l4t-gstreamer                        35.3.1-20230315004708                arm64        NVIDIA GST Application files
ii  nvidia-l4t-init                             35.3.1-20230315004708                arm64        NVIDIA Init debian package
ii  nvidia-l4t-initrd                           35.3.1-20230315004708                arm64        NVIDIA initrd debian package
ii  nvidia-l4t-jetson-io                        35.3.1-20230315004708                arm64        NVIDIA Jetson.IO debian package
ii  nvidia-l4t-jetson-multimedia-api            35.3.1-20230315004708                arm64        NVIDIA Jetson Multimedia API is a collection of lower-level APIs that support flexible application development.
ii  nvidia-l4t-jetsonpower-gui-tools            35.3.1-20230315004708                arm64        NVIDIA Jetson Power GUI Tools debian package
ii  nvidia-l4t-kernel                           5.10.104-tegra-35.3.1-20230315004708 arm64        NVIDIA Kernel Package
ii  nvidia-l4t-kernel-dtbs                      5.10.104-tegra-35.3.1-20230315004708 arm64        NVIDIA Kernel DTB Package
ii  nvidia-l4t-kernel-headers                   5.10.104-tegra-35.3.1-20230315004708 arm64        NVIDIA Linux Tegra Kernel Headers Package
ii  nvidia-l4t-libvulkan                        35.3.1-20230315004708                arm64        NVIDIA Vulkan Loader Package
ii  nvidia-l4t-multimedia                       35.3.1-20230315004708                arm64        NVIDIA Multimedia Package
ii  nvidia-l4t-multimedia-utils                 35.3.1-20230315004708                arm64        NVIDIA Multimedia Package
ii  nvidia-l4t-nvfancontrol                     35.3.1-20230315004708                arm64        NVIDIA Nvfancontrol debian package
ii  nvidia-l4t-nvpmodel                         35.3.1-20230315004708                arm64        NVIDIA Nvpmodel debian package
ii  nvidia-l4t-nvpmodel-gui-tools               35.3.1-20230315004708                arm64        NVIDIA Nvpmodel GUI Tools debian package
ii  nvidia-l4t-nvsci                            35.3.1-20230315004708                arm64        NVIDIA NvSci Package
ii  nvidia-l4t-oem-config                       35.3.1-20230315004708                arm64        NVIDIA OEM-Config Package
ii  nvidia-l4t-openwfd                          35.3.1-20230315004708                arm64        NVIDIA OpenWFD Package
ii  nvidia-l4t-optee                            35.3.1-20230315004708                arm64        OP-TEE userspace daemons, test programs and libraries
ii  nvidia-l4t-pva                              35.3.1-20230315004708                arm64        NVIDIA PVA Package
ii  nvidia-l4t-tools                            35.3.1-20230315004708                arm64        NVIDIA Public Test Tools Package
ii  nvidia-l4t-vulkan-sc                        35.3.1-20230315004708                arm64        NVIDIA Vulkan SC run-time package
ii  nvidia-l4t-vulkan-sc-dev                    35.3.1-20230315004708                arm64        NVIDIA Vulkan SC Dev package
ii  nvidia-l4t-vulkan-sc-samples                35.3.1-20230315004708                arm64        NVIDIA Vulkan SC samples package
ii  nvidia-l4t-vulkan-sc-sdk                    35.3.1-20230315004708                arm64        NVIDIA Vulkan SC SDK package
ii  nvidia-l4t-wayland                          35.3.1-20230315004708                arm64        NVIDIA Wayland Package
ii  nvidia-l4t-weston                           35.3.1-20230315004708                arm64        NVIDIA Weston Package
ii  nvidia-l4t-x11                              35.3.1-20230315004708                arm64        NVIDIA X11 Package
ii  nvidia-l4t-xusb-firmware                    35.3.1-20230315004708                arm64        NVIDIA USB Firmware Package

and apt-cache show nvidia-jetpack shows this for me:

apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Version: 5.1.1-b56
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1.1-b56), nvidia-jetpack-dev (= 5.1.1-b56)
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_5.1.1-b56_arm64.deb
Size: 29304
SHA256: 7b6c8c6cb16028dcd141144b6b0bbaa762616d0a47aafa3c3b720cb02b2c8430
SHA1: 387e4e47133c4235666176032af0f2ec86461dbb
MD5sum: 0a8692031bf35cc46f7a498e2937bda9
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

After modifying your apt sources and doing sudo apt-get update, hopefully you should be able to install and check this too.

OK gotcha - normally, if you use a flashing tool like Balena Etcher, it will compare the checksums to make sure it flashed correctly. However, when in doubt, redownload the image and reflash it. It could also be corruption on the SD card itself, if you have a different SD card you can try.

Thank you @dusty_nv After reformatting using a regular network and installing the image again. The apt sources list became the same as what you have. I have tried apt-cache show nvidia-jetpack and its output is the same as what you have. Also, sudo apt-get update no longer gives any error. I was able to install Pytorch and torchvision + cuda without using any virtual environment.

OK thanks @vahdat.melika, glad that you were able to get it working after reflashing your SD card! 👍

it’s worked after use pip3 install ‘pillow<9’, import torchvision successed

I have stumbled upon a bug in the function, when I run the 2.0 release on the CPU of my Orin devkit (JetPack 5.1.1). I cannot reproduce it with any of the 1.x releases, nor when I run the multiplication on the GPU, nor on an x86_64 CPU.
Under some circumstances, the result of the multiplication contains NaN entries, even if the input tensors are all zero. The problem is not entirely deterministic, it seems, but I was able to simplify my code to something small that reliable triggers the problem:

import torch

print ("Torch: ", torch.__version__)
DEV = 'cpu' # cuda works fine

for M in range(16, 257, 16):

    A = torch.zeros((1, 1), device=DEV)
    B = torch.zeros((1, M), device=DEV)
    C = torch.zeros((1, M), device=DEV), B, out=C)

    print ("%s x %s -> %s: " % (tuple(A.shape), tuple(B.shape), tuple(C.shape)), end='')
    if C.isnan().any():
        print ("*** %d NaNs found!!!" % torch.count_nonzero(C.isnan()))
        print ("ok")

When I run it, it get the following:

Torch:  2.0.0a0+fe05266f.nv23.04
Num threads: 1
(1, 1) x (1, 16) -> (1, 16): ok
(1, 1) x (1, 32) -> (1, 32): *** 2 NaNs found!!!
(1, 1) x (1, 48) -> (1, 48): *** 2 NaNs found!!!
(1, 1) x (1, 64) -> (1, 64): *** 4 NaNs found!!!
(1, 1) x (1, 80) -> (1, 80): *** 4 NaNs found!!!
(1, 1) x (1, 96) -> (1, 96): *** 6 NaNs found!!!
(1, 1) x (1, 112) -> (1, 112): *** 6 NaNs found!!!
(1, 1) x (1, 128) -> (1, 128): *** 8 NaNs found!!!
(1, 1) x (1, 144) -> (1, 144): *** 8 NaNs found!!!
(1, 1) x (1, 160) -> (1, 160): *** 10 NaNs found!!!
(1, 1) x (1, 176) -> (1, 176): *** 10 NaNs found!!!
(1, 1) x (1, 192) -> (1, 192): *** 12 NaNs found!!!
(1, 1) x (1, 208) -> (1, 208): *** 12 NaNs found!!!
(1, 1) x (1, 224) -> (1, 224): *** 14 NaNs found!!!
(1, 1) x (1, 240) -> (1, 240): *** 14 NaNs found!!!
(1, 1) x (1, 256) -> (1, 256): *** 16 NaNs found!!!

It happens in a standard virtual environment as well as in the docker image.
The pattern is clear: every time the size of the second array is increased by a multiple of 32, two additional NaNs are found. I suspect a vectorization or a parallellzation bug.

Did anyone else encounter similar issues?
Is this an NVidia-specific bug or should I report it upstream with the torch developers?


My self compiled torch says:
Torch: 2.1.0a0+git53d1d30
(1, 1) x (1, 16) → (1, 16): ok
(1, 1) x (1, 32) → (1, 32): ok
(1, 1) x (1, 48) → (1, 48): ok
(1, 1) x (1, 64) → (1, 64): ok
(1, 1) x (1, 80) → (1, 80): ok
(1, 1) x (1, 96) → (1, 96): ok
(1, 1) x (1, 112) → (1, 112): ok
(1, 1) x (1, 128) → (1, 128): ok
(1, 1) x (1, 144) → (1, 144): ok
(1, 1) x (1, 160) → (1, 160): ok
(1, 1) x (1, 176) → (1, 176): ok
(1, 1) x (1, 192) → (1, 192): ok
(1, 1) x (1, 208) → (1, 208): ok
(1, 1) x (1, 224) → (1, 224): ok
(1, 1) x (1, 240) → (1, 240): ok
(1, 1) x (1, 256) → (1, 256): ok

Hi @user119069, I’d report this upstream against the PyTorch GitHub as we aren’t making CPU optimizations for aarch64 (with the focus being on GPU support). As @herr_dieter_graef found, perhaps this issue was already found and patched by them in a newer version of PyTorch.

Hi,I did build torch with xnnpack instead of qnnpack.

Thanks @herr_dieter_graef, @dusty_nv.
The fact that it works with 2.1 is not conclusive, I’m afraid. As I mentioned, minor changes in the code can make the problem disappear (e.g. using C =, B) works for me in this code, but it was that form that triggered the bug in the original code).
I will try to compile 2.1 myself and/or check upstream whether the problem is known/fixed. I will keep you posted.

Best regards,

Do you have the latest version torch-2.0.0+nv23.05 in a distributed version, otherwise do you have the “export” option from the source of the last version you published?

Hi @storm12t48 , I don’t - you would need to build it from source after installing libopenmpi-dev

Thx for help
I managed to create a rag from source I created a script with up-to-date instructions if anyone needs it here:

set -e
echo "Installing Pytorch 2.0.0 distributed on your Jetson AGX ORIN"
sudo apt update
sudo apt upgrade
sudo apt-get install python3-pip								
sudo apt-get -y install autoconf bc build-essential g++-9 gcc-9 clang-8 lld-8 gettext-base gfortran-8 iputils-ping libbz2-dev libc++-dev libcgal-dev libffi-dev libfreetype6-dev libhdf5-dev libjpeg-dev liblzma-dev libncurses5-dev libncursesw5-dev libpng-dev libreadline-dev libssl-dev libsqlite3-dev libxml2-dev libxslt-dev locales moreutils openssl python-openssl rsync scons python3-pip libopenblas-dev
sudo apt-get -y install cmake libopenmpi-dev
if ! [ -L $patch_cuda ]; 
then ln -s $patch_cuda /usr/local/cuda 
git clone --recursive --branch v2.0.0
cd pytorch
export USE_NCCL=0
export USE_DISTRIBUTED=1                  
export USE_QNNPACK=0
export TORCH_CUDA_ARCH_LIST="7.2;8.7" 
export CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
export CUDA_BIN_PATH=/usr/local/cuda/bin
export CMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
export CUDNN_LIB_DIR=/usr/local/cuda/lib64
export USE_CUDA=ON
export CC=gcc
export CXX=g++
pip3 install -r requirements.txt
pip3 install scikit-build
pip3 install ninja
python3 develop && python3 -c "import torch"
python3 bdist_wheel
pip3 install dist/*.whl
echo "Congratulations!"
echo "You've successfully Pytorch 2.0.0 distributed Jetson aGX ORIN"

###Just add your arch 8.7(see of jtop CUDA archbin 8.7 for jetson AGX orin jetpack 5.1.1) 
#to torch.utils.cpp_extension like this patch: 
### ref:
##do you need add 8.7 in the list like that supported_arches = ['3.5', '3.7', '5.0', '5.2', '5.3', '6.0', '6.1', ##'6.2','7.0','7.2', '7.5', '8.0', '8.6','8.7' '8.9', '9.0']

This worked great for me. Thanks!
BTW, if folks have issues installing torchvision on top of this with:
ValueError: Unknown CUDA arch (8.7) or GPU not supported
you can patch _get_cuda_arch_flags() in your site-packages/torch/utils/ to accept that value. See Building PyTorch from source fails - #2 by dusty_nv

python3 install --user
show ModuleNotFoundError: No module named ‘torch’

@linyongboole what does pip3 show torch list, and are you able to do python3 -c 'import torch' ?