Incompatible torch2.2+Cuda12.2 wheel with other python libraries for AGX Orin Jetpack6.0

I’m trying to install torch2.2 with cuda enabled (12.2 cuda) on my AGX Orin Developer Kit. I have followed the instructions here and have installed torch-2.2.0a0+81ea7a4.nv24.01-cp310-cp310-linux_aarch64.whl successfully. However, I also need to install other packages which depend on a torch installation (versions selected based on known compatibility with torch2.2):

  • torchvision==0.17.0
  • torchaudio==2.2.0,
  • kornia==0.7.1
  • lightning==2.2.0

The issue I encounter is that when I install any of these packages, a new wheel for torch==2.2.0 without CUDA is downloaded and installed as a required dependency, effectively overwriting my previous installation of torch==2.2.0a0+81ea7a4 and making me lose CUDA access.

How can I successfully install my desired packages alongside the provided wheel torch-2.2.0a0+81ea7a4.nv24.01-cp310-cp310-linux_aarch64.whl and have CUDA enabled? Which versions, or specific wheels should I use?

System specifications:
Jetpack 6.0
L4T version - R36.2.0
Cuda 12.2

You have to build it from source.

Or re-install the GPU version of the PyTorch wheel after. Wheels for torchvision and torchaudio have also been posted here:

Keeping all the versions straight and tested, with CUDA enabled, is one of the reasons for jetson-containers projects, but you can manually install the wheels that it built if you don’t wish to use Docker.

Thanks @dusty_nv! I have actually tried those torch2.3 wheels (including torchaudio and torchvision) but installing this combination of wheels did not succeed in having CUDA enabled. Also, since I need to use the lightning library and this is not compatible with torch2.3 I do need to keep torch2.2 version. Are there torchvision and torchaudio wheels available as well for torch==2.2.0a0+81ea7a4 version?

Re containers: Would you be able to point me to the container in that project which matches my system requirements (cuda12.2) and keeps torch2.2 version on all torch wheels? I could not find one there…

Hi @lfermoselle, yep you can find the previous wheels for torchaudio and torchvision here:

And the l4t-pytorch container comes with PyTorch 2.2, torchaudio 2.2, and torchvision 0.17:

These containers are built to use that pip server from above, so when you pip3 install lightning on top, it should automatically keep/install my CUDA-enabled torch wheels instead of pulling the CPU-only ones from PyPi.

1 Like

Thank you so much @dusty_nv! I have tried the container you provided dustynv/l4t-pytorch:r36.2.0 and I see the following wheels installed:

  • torch2.2.0
  • torchvision0.17.2+c1d70fe
  • torchaudio2.2.2+cefdb36

It seems torchvision and torchaudio wheels have CUDA-enabled but the main torch wheel is missing cuda. I have tested your image fresh and have not installed any of my desired python libraries there yet. Is this the expected behaviour? Should I still install the torch-2.2.0a0+81ea7a4.nv24.01-cp310-cp310-linux_aarch64.whl wheel in this image myself?

No you should not need to, it is strange because the same container image here reports torch2.2.0 as having CUDA enabled in the tests, on different Jetsons with R36.2 and R36.3…

For example, this is the output I get from running this torch sanity test script on R36.3 using the dustynv/l4t-pytorch:r36.2.0 image (which was built on another machine):

testing PyTorch...
PyTorch version: 2.2.0
CUDA available:  True
cuDNN version:   8904
PyTorch built with:
  - GCC 11.4
  - C++ Version: 201703
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - CUDA Runtime 12.2
  - NVCC architecture flags: -gencode;arch=compute_87,code=sm_87
  - CuDNN 8.9.4
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=12.2, CUDNN_VERSION=8.9.4, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=2.2.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

PACKAGING_VERSION=2.2.0
TORCH_CUDA_ARCH_LIST=8.7

/mount/jetson-containers/dev/packages/pytorch/test.py:23: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at /opt/pytorch/torch/csrc/tensor/python_tensor.cpp:83.)
  a = torch.cuda.FloatTensor(2).zero_()
Tensor a = tensor([0., 0.], device='cuda:0')
Tensor b = tensor([-0.1736,  1.9561], device='cuda:0')
Tensor c = tensor([-0.1736,  1.9561], device='cuda:0')
testing LAPACK (OpenBLAS)...
done testing LAPACK (OpenBLAS)
testing torch.nn (cuDNN)...
done testing torch.nn (cuDNN)
testing CPU tensor vector operations...
/mount/jetson-containers/dev/packages/pytorch/test.py:62: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  cpu_y = F.softmax(cpu_x)
Tensor cpu_x = tensor([12.3450])
Tensor softmax = tensor([1.])
Tensor exp (float32) = tensor([[2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183]])
Tensor exp (float64) = tensor([[2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183],
        [2.7183, 2.7183, 2.7183]], dtype=torch.float64)
Tensor exp (diff) = 7.429356050359104e-07
PyTorch OK

Thank you for the quick reply and pointing out this test script @dusty_nv. It seems the container is having trouble finding the CUDA driver on my system. Here is the output of running your test script after pulling your container with docker pull dustynv/l4t-pytorch:r36.2.0, starting with docker run --name test -it dustynv/l4t-pytorch:r36.2.0 and typing python3, on my Jetson R36.2:

print('testing PyTorch...')                                                                                                               
testing PyTorch...                                                                                                                            
                                                                                                                                  
import torch                                                                                                                              
                                                                                                                                          
print('PyTorch version: ' + str(torch.__version__))                                                                                       
PyTorch version: 2.2.0                                                                                                                        
print('CUDA available:  ' + str(torch.cuda.is_available()))                                                                               
CUDA available:  False                                                                                                                        
print('cuDNN version:   ' + str(torch.backends.cudnn.version()))                                                                          
cuDNN version:   8904                                                                                                                         
                                                                                                                                          
print(torch.__config__.show())                                                                                                            
PyTorch built with:                                                                                                                           
  - GCC 11.4                                                                                                                                  
  - C++ Version: 201703                                                                                                                       
  - OpenMP 201511 (a.k.a. OpenMP 4.5)                                                                                                         
  - LAPACK is enabled (usually provided by MKL)                                                                                               
  - NNPACK is enabled                                                                                                                         
  - CPU capability usage: NO AVX                                                                                                              
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CUDA_VERSION=12.2, CUDNN_VERSION=8.9.4, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -D_GLIBC
XX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_XNNPACK -DSYMBOLICATE_MOB
ILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wna
rrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wn
o-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=
old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-er
rno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=open, TORCH_VERSION=2.2.0, USE_CUDA=ON, U
SE_CUDNN=ON, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=0, US
E_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,                                                                         
                                                                                                                                              
                                                                                                                                          
# fail if CUDA isn't available                                                                                                            
assert(torch.cuda.is_available())                                                                                                         
Traceback (most recent call last):                                                                                                            
  File "<stdin>", line 1, in <module>                                                                                                         
AssertionError                                                                                                                                
                                                                                                                                          
# check that version can be parsed                                                                                                        
from packaging import version                                                                                                             
from os import environ                                                                                                                    
                                                                                                                                          
print('PACKAGING_VERSION=' + str(version.parse(torch.__version__)))                                                                       
PACKAGING_VERSION=2.2.0                                                                                                                       
print('TORCH_CUDA_ARCH_LIST=' + environ.get('TORCH_CUDA_ARCH_LIST', 'None') + '\n')                                                       
TORCH_CUDA_ARCH_LIST=8.7                                                                                                       
                                                                                                                                  
                                                                                                                           
# quick cuda tensor test                                                                                                      
a = torch.cuda.FloatTensor(2).zero_()                                                                                      
<stdin>:1: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data
, dtype=*, device='cuda') to create tensors. (Triggered internally at /opt/pytorch/torch/csrc/tensor/python_tensor.cpp:83.)     
Traceback (most recent call last): 
  File "<stdin>", line 1, in <module>                                                                                                         
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()                                                                                                         
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.co
m/Download/index.aspx                                                                                                             
print('Tensor a = ' + str(a))                                         
Traceback (most recent call last):                                                                                                            
  File "<stdin>", line 1, in <module>                          
NameError: name 'a' is not defined                              
                                                           
b = torch.randn(2).cuda()                                             
Traceback (most recent call last):                                        
  File "<stdin>", line 1, in <module>                                     
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()                                              
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.co
m/Download/index.aspx                                                     
print('Tensor b = ' + str(b))                                         
Traceback (most recent call last):                                     
  File "<stdin>", line 1, in <module>                                     
NameError: name 'b' is not defined                                        
                                                                      
c = a + b                                                             
Traceback (most recent call last):                                        
  File "<stdin>", line 1, in <module>                                     
NameError: name 'a' is not defined                                        
print('Tensor c = ' + str(c))                                         
Traceback (most recent call last):                                        
  File "<stdin>", line 1, in <module>
NameError: name 'c' is not defined                                        
                                                                      
# LAPACK test                                                         
print('testing LAPACK (OpenBLAS)...')                                 
testing LAPACK (OpenBLAS)...                                              
                                                                      
a = torch.randn(2, 3, 1, 4, 4)                                        
b = torch.randn(2, 3, 1, 4, 4)                                        
                                                                      
x, lu = torch.linalg.solve(b, a)                                      
                                                                      
print('done testing LAPACK (OpenBLAS)')                               
done testing LAPACK (OpenBLAS)                                            
                                                                      
# torch.nn test                                                    
print('testing torch.nn (cuDNN)...')                                  
testing torch.nn (cuDNN)...                                            
                                                                      
import torch.nn                                                       
                                                                   
model = torch.nn.Conv2d(3,3,3)                                        
data = torch.zeros(1,3,10,10)                                         
model = model.cuda()                                               
Traceback (most recent call last):                                        
  File "<stdin>", line 1, in <module>                                     
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 911, in cuda
    return self._apply(lambda t: t.cuda(device))                                                                                              
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 825, in _apply
    param_applied = fn(param)                                          
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 911, in <lambda>
    return self._apply(lambda t: t.cuda(device))                                                                                              
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()                                              
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.co
m/Download/index.aspx
data = data.cuda()                                                 
Traceback (most recent call last):                                     
  File "<stdin>", line 1, in <module>                                  
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 302, in _lazy_init
    torch._C._cuda_init()                                              
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.co
m/Download/index.aspx
out = model(data)                                                  
                                                                   
#print(out)                                                        
                                                                   
print('done testing torch.nn (cuDNN)')                                                                                                    
done testing torch.nn (cuDNN)                                          
                                                                   
# CPU test (https://github.com/pytorch/pytorch/issues/47098)                                                                              
print('testing CPU tensor vector operations...')                                                                                          
testing CPU tensor vector operations...                                                                                                       
                                                                   
import torch.nn.functional as F                                    
cpu_x = torch.tensor([12.345])                                     
cpu_y = F.softmax(cpu_x)                                           
<stdin>:1: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
                                                                   
print('Tensor cpu_x = ' + str(cpu_x))                                                                                                     
Tensor cpu_x = tensor([12.3450])                                       
print('Tensor softmax = ' + str(cpu_y))                                                                                                   
Tensor softmax = tensor([1.])                                          
                                                                   
if cpu_y != 1.0:                                                   
...     raise ValueError('PyTorch CPU tensor vector test failed (softmax)\n')
...                                                                    
# https://github.com/pytorch/pytorch/issues/61110                                                                                         
t_32 = torch.ones((3,3), dtype=torch.float32).exp()                                                                                       
t_64 = torch.ones((3,3), dtype=torch.float64).exp()                                                                                       
diff = (t_32 - t_64).abs().sum().item()                                                                                                   
                                                                   
print('Tensor exp (float32) = ' + str(t_32))                                                                                              
Tensor exp (float32) = tensor([[2.7183, 2.7183, 2.7183],                                                                                      
        [2.7183, 2.7183, 2.7183],                                      
        [2.7183, 2.7183, 2.7183]])                                     
print('Tensor exp (float64) = ' + str(t_64))                                                                                              
Tensor exp (float64) = tensor([[2.7183, 2.7183, 2.7183],                                                                                      
        [2.7183, 2.7183, 2.7183],                                      
        [2.7183, 2.7183, 2.7183]], dtype=torch.float64)                                                                                       
print('Tensor exp (diff) = ' + str(diff))                                                                                                 
Tensor exp (diff) = 7.429356050359104e-07                                                                                                     
                                                                   
if diff > 0.1:                                                     
...     raise ValueError(f'PyTorch CPU tensor vector test failed (exp, diff={diff})')
...                                                                    
... print('PyTorch OK\n')                                              

Outside the container, when I run nvcc --version on my AGX Orin I get the following:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:08:11_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

My system Jetson version is: nvidia-l4t-core 36.2.0-20231218214829

I have been able to install torch2.2+cuda before using the wheel torch-2.2.0a0+81ea7a4.nv24.01-cp310-cp310-linux_aarch64.whl, so I believe my system has the a valid CUDA driver? Am I missing some configuration step to be able to use your container with torch2.2+cuda?

Thank you for all your help with this issue!

I have solved my issue above, I was just missing cuda on my PATH variable. Adding export PATH=${CUDA_HOME}/bin:${PATH} fixed it - I have now torch2.2, torchvision and torchaudio correctly installed with cuda enabled. Thank you for this image @dusty_nv!

1 Like