Orin Pytorch/CUDA issue

We have trouble getting Pytorch to work on Orin. We have Jetpack 5.1.1-b56 installed, and downloaded Pytorch 2.0.0 from this website: PyTorch for Jetson

The output of the installation:

pip3 install numpy torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl 
Defaulting to user installation because normal site-packages is not writeable
Processing ./torch-2.0.0+nv23.05-cp38-cp38-linux_aarch64.whl
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (1.17.4)
Requirement already satisfied: filelock in /home/nvidia/.local/lib/python3.8/site-packages (from torch==2.0.0+nv23.05) (3.12.2)
Requirement already satisfied: jinja2 in /home/nvidia/.local/lib/python3.8/site-packages (from torch==2.0.0+nv23.05) (3.1.2)
Requirement already satisfied: networkx in /home/nvidia/.local/lib/python3.8/site-packages (from torch==2.0.0+nv23.05) (3.1)
Requirement already satisfied: sympy in /home/nvidia/.local/lib/python3.8/site-packages (from torch==2.0.0+nv23.05) (1.12)
Requirement already satisfied: typing-extensions in /home/nvidia/.local/lib/python3.8/site-packages (from torch==2.0.0+nv23.05) (4.7.1)
Requirement already satisfied: MarkupSafe>=2.0 in /home/nvidia/.local/lib/python3.8/site-packages (from jinja2->torch==2.0.0+nv23.05) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /home/nvidia/.local/lib/python3.8/site-packages (from sympy->torch==2.0.0+nv23.05) (1.3.0)
DEPRECATION: distro-info 0.23ubuntu1 has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of distro-info or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063
DEPRECATION: python-debian 0.1.36ubuntu1 has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of python-debian or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063
Installing collected packages: torch
Successfully installed torch-2.0.0+nv23.5

When we type this in the terminal: Python3 → import torch, the following error message is shown:

>>> import torch
Traceback (most recent call last):
  File "/home/nvidia/.local/lib/python3.8/site-packages/torch/__init__.py", line 168, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nvidia/.local/lib/python3.8/site-packages/torch/__init__.py", line 228, in <module>
    _load_global_deps()
  File "/home/nvidia/.local/lib/python3.8/site-packages/torch/__init__.py", line 189, in _load_global_deps
    _preload_cuda_deps(lib_folder, lib_name)
  File "/home/nvidia/.local/lib/python3.8/site-packages/torch/__init__.py", line 154, in _preload_cuda_deps
    raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libcublas.so.*[0-9] not found in the system path ['', '/usr/lib/python38.zip', '/usr/lib/python3.8', '/usr/lib/python3.8/lib-dynload', '/home/nvidia/.local/lib/python3.8/site-packages', '/usr/local/lib/python3.8/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.8/dist-packages']

We believe this might have something to due with conflicting/missing CUDA versions. We currently have CUDA 11.4 installed, but not 10.

Hi,

It looks like some CUDA libraries are missing in your environment.
Do you also install the component from JetPack 5.1.1?

Thanks.

1 Like

Thanks for the quick answer. We already have CUDA 11.4 installed.

Can you please clarify what we are missing?
Could it perhaps be related to an environment variable?

@s.k can you confirm that you have these libraries installed in the CUDA Toolkit? they should have been installed by SDK Manager when you flashed the board

ls -ll /usr/local/cuda/lib64/libcurand*
lrwxrwxrwx 1 root root       15 Sep 14  2022 /usr/local/cuda/lib64/libcurand.so -> libcurand.so.10
lrwxrwxrwx 1 root root       23 Sep 14  2022 /usr/local/cuda/lib64/libcurand.so.10 -> libcurand.so.10.2.5.297
-rw-r--r-- 1 root root 77999248 Sep 14  2022 /usr/local/cuda/lib64/libcurand.so.10.2.5.297
-rw-r--r-- 1 root root 77968142 Sep 14  2022 /usr/local/cuda/lib64/libcurand_static.a

You could also try the l4t-pytorch container which comes with it all pre-installed for you:

1 Like

This is the output of ls --ll in that directory:

nvidia@orin:/usr/local/cuda/lib64$ ls -ll
total 2672
-rw-r--r-- 1 root root  801732 sep.  14  2022 libcudadevrt.a
lrwxrwxrwx 1 root root      17 sep.  14  2022 libcudart.so -> libcudart.so.11.0
lrwxrwxrwx 1 root root      21 sep.  14  2022 libcudart.so.11.0 -> libcudart.so.11.4.298
-rw-r--r-- 1 root root  699488 sep.  14  2022 libcudart.so.11.4.298
-rw-r--r-- 1 root root 1189054 sep.  14  2022 libcudart_static.a
-rw-r--r-- 1 root root   36394 sep.  14  2022 libculibos.a
drwxr-xr-x 2 root root    4096 aug.   2 13:13 stubs

We will try to do a fresh install of the CUDA components with the SDK manager. If it doesn’t work we will try the container. Thanks a lot!

@s.k you could also try sudo apt-get install cuda-toolkit-11-4

Problem resolved by installing the necessary CUDA components from the SDK manager. Thanks!

Help me this .
I also face missing cuda problem after install jetpack 5.1 on Nvidia Jetson Orin NX.


I have tried install cuda on jetson with this commands :

  • sudo apt-get install nvidia-cuda
    But it does not work. I also tried with the above command you suggested but i get the error :

    Please help me solve this problem .
    Thanks you !

@hieprooney16 can you do cat /etc/apt/sources.list.d/nvidia-l4t-apt-source.list ? Did you do sudo apt-get update before?

When you flashed the board, did you have SDK Manager install CUDA/cuDNN/TensorRT? You can run SDK Manager again and just de-select the flashing part and it will just install CUDA/ect on your Jetson for you.

I install Jetpack base on this link : JETSON-ORIN-NX-16G-DEV-KIT - Waveshare Wiki .
And i do as these 4 steps below on the image


Do i need to run this command after 4 steps above :

Thanks

Yes, I’m not entirely sure what state that Waveshare’s flashing sequence puts the board in, but presumably sudo apt-get update && sudo apt-get install nvidia-jetpack would install CUDA/ect for you

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.