Error when "import torch" is executed after python3

nagesh_accord · February 8, 2024, 1:14pm

Hi,

I am getting the following error after running “import torch” command as shown below:

I have Jetpack 5.1.2 installed on my Jetson AGX Xavier. Please let me know how to resolve this error.

The CUDA version installed is 12.0
The CUDA upgrade package is 12.2
python version is 3.8
Torch version is 2.1
numpy version that got installed during the installation process observation is 1.24.4

root@linux:/home/trident# $ python3
bash: $: command not found
root@linux:/home/trident# export LD_LIBRARY_PATH=/usr/lib/llvm-8/lib:$LD_LIBRARY_PATH
root@linux:/home/trident# python3
Python 3.8.10 (default, Nov 22 2023, 10:22:35)
[GCC 9.4.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.

import torch
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/torch/init.py”, line 168, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File “/usr/lib/python3.8/ctypes/init.py”, line 373, in init
self._handle = _dlopen(self._name, mode)
OSError: libcufft.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.8/dist-packages/torch/init.py”, line 228, in
_load_global_deps()
File “/usr/local/lib/python3.8/dist-packages/torch/init.py”, line 189, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File “/usr/local/lib/python3.8/dist-packages/torch/init.py”, line 154, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libcublas.so.*[0-9] not found in the system path [‘’, ‘/usr/lib/python38.zip’, ‘/usr/lib/python3.8’, ‘/usr/lib/python3.8/lib-dynload’, ‘/usr/local/lib/python3.8/dist-packages’, ‘/usr/lib/python3/dist-packages’, ‘/usr/lib/python3.8/dist-packages’]

print(torch.version)
Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘torch’ is not defined

dusty_nv · February 9, 2024, 4:42am

moving this topic to the Jetson AGX Xavier forum

Hi @nagesh_accord, the PyTorch wheels for JetPack 5 were built against CUDA 11.4. The PyTorch binaries aren’t compatible across changes in the major version of CUDA or cuDNN. You would either need to recompile PyTorch for your environment or change back to CUDA 11.4. There are steps for building PyTorch in this topic:

nagesh_accord · February 9, 2024, 4:59am

Thanks for the updates. As I am bit new to this CUDA, Pytorch, openCV and other stuff, please bear with my queries.

Please recommend which would be better method, to change back to CUDA 11.4 or recompile Pytorch for Jetpack 5.1.2.

For either cases, please point me to the correct steps like how to uninstall CUDA 12.0/12.2 and install cuda 11.4
also, how to recompile pytorch.

wanted to know the correct order of installations:
CUDA toolkit, pytorch, openCV, etc
which we need to install first and second and third etc…
In the documentation it says maximum pytorch version compatible for jetpack 5.1.x
is pytorch 2.0.0
but in the link above you shared … it says, pYtorch 2.1.0 is compatible with Jetpack 5.1.2.
which is right?

I had installed Pytorch 2.1.0 based on this link only. Please clarify.

PyTorch v2.1.0

*** JetPack 5.1 (L4T R35.2.1) / JetPack 5.1.1 (L4T R35.3.1) / JetPack 5.1.2 (L4T R35.4.1)**
** * Python 3.8 - torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl**

dusty_nv · February 9, 2024, 3:11pm

@nagesh_accord I believe that compatibility table just pertains to the official PyTorch wheels that NVIDIA releases, however newer versions can continue to be built (like I do, and post to that thread). You can find instructions for building from source in that topic. I’m not sure how to undo the steps you have taken so far with installing different versions of CUDA/ect in your environment (although it could be as easy as changing the symbolic link of what points to /usr/local/cuda)

JetPack already comes with CUDA Toolkit and OpenCV after flashing your device with SDK Manager and allowing it to complete the post-install steps, so you should just need to install PyTorch. If you continue running into issues, you may just want to re-flash to get your system in a known-working state again. Or in that case, I also recommend trying the l4t-pytorch container which already includes PyTorch, torchvision, OpenCV, ect pre-installed inside the container:

nagesh_accord · February 9, 2024, 4:59pm

Ok. How to. change symbolic link to.the version of CUDA that we want, in case I have more than one version of CUDA toolkit installed?

I am working on customized carrier board, where I have flashed the updated BSP myself through flashing to the SOM module.
So SDK manager does not work on the SOM module which is present on the customized carrier board. Correct me if I am wrong?

So.I need to install CUDA tool kit, Pytorch,.Open CV etc explicitly myself on the SOM.

Just want to know if these pytorch container installs CUDA complete toolkit package also?

dusty_nv · February 9, 2024, 6:33pm

First, make sure that /usr/local/cuda is indeed a symbolic link by checking ls -ll /usr/local/cuda* (you will see what they point to). Then rm -rf /usr/local/cuda and re-link it to CUDA 11 with ln -s /usr/local/cuda-11 /usr/local/cuda. If it is still not working after this and you are unfamiliar with Linux/CUDA, you probably just want to re-flash.

You should still be able to use SDK Manager to perform the post-flashing setup after you have flashed your custom BSP yourself. You can de-select the flashing step in SDK Manager and have it just do the post-flashing steps to install installing CUDA, cuDNN, OpenCV, ect. Or you may just be able to install them from the NVIDIA apt repo that typically comes with L4T.

Since JetPack 5 and newer, yes the containers include CUDA/ect installed inside the containers themselves (as opposed to on JetPack 4, these were mounted from the device). So if you are just using containers, technically you don’t even need CUDA Toolkit on your device. The l4t-pytorch container includes the full CUDA Toolkit (as do all of my containers, which are intended for development), however you can find other base images on NGC that only include subsets of these intended for deployment to keep the image size down.

nagesh_accord · February 12, 2024, 4:47am

ok. I can try this and check. However, as the hardware units are now in final boxed packed stage, we dont have the short- boot recovery mode enabled ( by shorting two pins) with the two USB 3.0 available for flashing, I doubt I can do any more flashing.

hence,I may have to fix any installation issues of any tools/software/libraies without flashing from now on.

Just want to know, Can we execute SDK manager on the target or SDK manager is meant to be used from the Host PC which is connected to the target?
As currently I cannot force my target to Boot recovery mode, with shorting of pins, can we execute SDK manager from host just by connecting target to the Host PC through USB cable?

Ok . I am currently following this method as per the documentation provided to install using package manager method( either local/network type) or run file installation method in the below link:

Hope my understanding is correct.

Ok. Now I am in a dilemma which method to use, container method or pakage manager installation method? which one do you suggest is very simple and works fine smoothly with out much overheads.

Can you throw more light on this NGC? Are these some other method apart from package installation or container method installation where we can install?

Sorry for many queries. I want to understand more things about these so asking these questions. Thanks.

dusty_nv · February 12, 2024, 2:23pm

@nagesh_accord SDK Manager runs on the Ubuntu PC, and I believe it can reset your Jetson into recovery mode without shorting the pins, or if you run sudo reboot –force forced-recovery from your Jetson.

I would personally recommend the container method since you seem to be deploying system(s) and want to install PyTorch and presumably other ML packages which can have complex dependencies.

NGC is a container registry for production container base images, you can find the containers related to JetPack-L4T here: https://catalog.ngc.nvidia.com/?filters=&orderBy=scoreDESC&query=l4t. There are also many development images available through jetson-containers, many of which using PyTorch.

nagesh_accord · February 13, 2024, 2:11am

Was some how able to download latest version of SDK manager on my Ubuntu 20.4 host PC.

I also found this link which speaks about extra configuration file generation for my customized board flashing or software installation etc

(The Extra Configuration File — sdk-manager 2.0.0 documentation)

Should I also generate this file for installation of CUDa and other packages through SDK manager?

Also some of the .json files mentioned software reference file and hardware reference file were not available for my unit jetson agx Xavier industrial.

Will those be generated after I connect the target to host PC and see the connection in the SDK manager?

I will trying this step today.

Our customer wants all components of jetpack 5.1.2 to be installed on the target before release.
So SDK manager method of installation would be better I thought, than container method. Pls clarify.

Does container has all required components for jetpack 5.1.2?

Thanks for this information.
It may take some time for me to understand this, will go through in future.

dusty_nv · February 13, 2024, 2:24am

I would only get into this after you are comfortable with the basics of SDK Manager. Already have your board flashed with L4T using your custom method, and then use SDK Manager to just install the JetPack components like CUDA/cuDNN/ect (you can de-select the OS flashing step)

Yes in that case, I would have all the JetPack components installed normally, outside of container. l4t-jetpack includes all the JetPack components, while mine vary depending on the container (for example, may or may not have OpenCV and GStreamer depending on the container requirements)

nagesh_accord · February 13, 2024, 11:58am

The SDK manager is not able to detect by Jetson AGX Xavier Industrial board. it says " could not detect the board" as shown below:

Does it expects the jetson target to be in recovery mode to be detected here y Sdk manager?

As the documentation in the website says , it should detect automatically as shown below:

https://docs.nvidia.com/sdk-manager/install-with-sdkm-jetson/index.html#step-1-set-up-the-development-environment

Please clarify.

As we are worried to try out this command
"sudo reboot –force forced-recovery "

If we try out this command and if the unit enters to recovery mode, we are afraid, we may have to flash the unit to bring back to normal boot mode. Please clarify?

As our unit is boxed up fully, and we dont want to flash any thing again.

Also did not find steps in the documentation where and how to enter the Manual or automatic recovery mode. Please let us know more details about this.

nagesh_accord · February 13, 2024, 1:18pm

In the above link

or if using ‘docker run’ (specify image and mounts/ect)

sudo docker run --runtime nvidia -it --rm --network=host dustynv/l4t-pytorch:r36.2.0

Did not understand why do we need these input parameters?
"run --runtime nvidia -it --rm --network=host "

Does this network=host really work, as I am not able to detect jetson target on my host PC in SDK manager.

dusty_nv · February 13, 2024, 3:20pm

Are you sure you have your Jetson connected to your PC over the Jetson’s USB flashing port? Since you are using a custom system, I am not sure which port this would be. You would also see an NVIDIA USB device show up under lsusb if it were connected.

No, all you have to do is reboot the board again to get it out of recovery mode and back into normal mode, and it won’t have changed the device unless you actually reflashed it from SDK Manager.

dusty_nv · February 13, 2024, 3:22pm

This just pertains to the containers and has nothing to do with SDK Manager. SDK Manager initially communicates with your Jetson over USB, not TCP/IP. Later during the post-flashing install steps, that USB connection will make a virtual ethernet adapter (so there is some networking used to install the packages like CUDA/cuDNN/ect) but you shouldn’t manually have to do that.

If you continue having questions or issues with SDK Manager, I recommend opening a new topic about that since you are using a custom system and not what this topic was originally about, then one of our experts in that area can help you with the finer details of that process specific to your circumstances. Thanks and best of luck!

nagesh_accord · February 13, 2024, 5:16pm

We are.aware of the USB port that we were using for flashing through manual flash command method.
However, we do that by shorting two pins and force and unit to recovery mode so that it lists as NVIDIA CORPORATION device when we run lsusb command.

Now.that we have packed the unit and boxed it fully and the shorting of the pins have been removed and that USB is used as a normal USB 3.0(not.flashing USB port any more), we are not able to connect to SDK manager I suppose.
Since you told we can force the unit to recovery mode by using some reboot command on the boxed up unit, without shorting pins, I will try that and see tmrw if possible.thanks
Sorry for slightly deviating with SDK.manager topic here.

Thanks for the confirmation. We shall try that command tmrw and see if Jetson lists upon lsusb command and we can see SDk.manager automatically detecting the board.
Thanks

nagesh_accord · February 14, 2024, 1:32am

@dusty_nv

I am planning this use this command for container installation of today as shown below:

or explicitly specify one of the container images above

./run.sh dustynv/l4t-pytorch:r35.4.1

I have few queries, kindly clarify:
1)
I think I need to install docker application first and then execute this command. Am I correct?

Any idea how much time it takes to install this ?
what should be next action to be taken, if I observe any kind of errors during the installation?
please provide some basic test steps for me, if possible to prove that CUDA, pytorch, Open CV are installed correctly and working fine.
Any command to uninstall this entire container which I am installing now? Just in case I want to install container for another linux tegra version.

Thanks.

nagesh_accord · February 14, 2024, 4:16am

I am getting the below error. when trying to run the command
“./run.sh dustynv/l4t-pytorch:r35.4.1” on the Jetson target. Please let me know how to resolve this error. It says "/tmp/.docker.xauth " does not exist.

root@linux:/home/trident/Downloads# ./run.sh dustynv/l4t-pytorch:r35.4.1
localuser:root being added to access control list
xauth: file /tmp/.docker.xauth does not exist

sudo docker run --runtime nvidia -it --rm --network host --volume /tmp/argus_socket:/tmp/argus_socket --volume /etc/enctune.conf:/etc/enctune.conf --volume /etc/nv_tegra_release:/etc/nv_tegra_release --volume /tmp/nv_jetson_model:/tmp/nv_jetson_model --volume /home/trident/Downloads/data:/data --device /dev/snd --device /dev/bus/usb -e DISPLAY=:0 -v /tmp/.X11-unix/:/tmp/.X11-unix -v /tmp/.docker.xauth:/tmp/.docker.xauth -e XAUTHORITY=/tmp/.docker.xauth --device /dev/video0 dustynv/l4t-pytorch:r35.4.1
docker: Error response from daemon: unknown or invalid runtime name: nvidia.
See ‘docker run --help’.

nagesh_accord · February 14, 2024, 4:46am

I was trying the third installation option( apart from SDK Manager and CONTAINER installation methods ) to install CUDA 11.4.4 for my Jetpack 5.1.2 as per the below link:

I am facing some errors as shown below, when I execute the below command as per the documentation in the above link:

root@linux:/tmp# sudo apt-get install linux-headers-$(uname -r)
Reading package lists… Done
Building dependency tree
Reading state information… Done
E: Unable to locate package linux-headers-5.10.120-tegra
E: Couldn’t find any package by glob ‘linux-headers-5.10.120-tegra’
E: Couldn’t find any package by regex ‘linux-headers-5.10.120-tegra’

Any idea how to resolve this error?

Below is the information I have regarding the L4T version and other things:

root@linux:/tmp# uname -m && cat /etc/*release
aarch64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION=“Ubuntu 20.04.6 LTS”

R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 1 19:57:35 UTC 2023

NAME=“Ubuntu”
VERSION=“20.04.6 LTS (Focal Fossa)”
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME=“Ubuntu 20.04.6 LTS”
VERSION_ID=“20.04”
HOME_URL=“https://www.ubuntu.com/”
SUPPORT_URL=“https://help.ubuntu.com/”
BUG_REPORT_URL=“Bugs : Ubuntu”
PRIVACY_POLICY_URL=“https://www.ubuntu.com/legal/terms-and-policies/privacy-policy”
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

I want to know what to be filled for the and fields in the below installation instructions for my UBUNTU:

Note: After this CUDA installation, I am planning to install Pytorch and Open CV seperately. Hope I am right with the procedure.

nagesh_accord · February 14, 2024, 12:29pm

I uninstalled CUDA 12.2 and installed CUDA 11.4 on my Jetson which is valid for my Jetpack 5.1.2, but still it is giving driver version mismatch error as shown below, when we execute the sample CUDA program. Any idea why?

root@linux:/usr/local/Cuda_Samples_Extracted/NVIDIA_CUDA-11.4_Samples/1_Utilities/deviceQuery# ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
→ CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

For uninstalling CUDA 12.2, I followed this command as per the website:

Ubuntu and Debian

To remove CUDA Toolkit:

sudo apt-get --purge remove “cuda” “cublas” “cufft” “cufile” “curand” \ “cusolver” “cusparse” “gds-tools” “npp” “nvjpeg” “nsight*” “nvvm”

I have doubt, should I execute the remaining two commands
sudo apt-get --purge remove “nvidia” “libxnvctrl” -
sudo apt-get autoremove

to clean up the CUDA 12.2 version fully,

[ but at the same time, I am afraid, it may remove other nvidia drivers and my system may not boot at all !!( once it had happened earlier in a similar situation ]

dusty_nv · February 14, 2024, 5:25pm

@nagesh_accord The docker daemon/services and NVIDIA Container Runtime should already be installed by SDK Manager (unless you never had SDK Manager install them)

If you browse my various container packages from the link below, each has associate test script(s) that you could run, or pick commands from to run manually:

The docker rmi command will remove a container image that you previously downloaded. docker images will list the container images that you already have on your system (you may need sudo for these if your user isn’t part of the docker usergroup)

Topic		Replies	Views
SDK Manager install fails Jetson Xavier NX sdkm , docker	46	2875	November 13, 2023
Flashing and CUDA installation using SDK Manager Installation Jetson AGX Xavier reflash , cuda , board-design	27	1655	March 12, 2024
JetPack 4.2 — L4T R32.1 release for Jetson AGX Xavier, Jetson TX2, and Jetson Nano Jetson TX2	73	13955	September 21, 2019
JetPack 4.6 Production Release with L4T 32.6.1 Jetson Nano	47	12051	March 10, 2022
Announcing JetPack 5.0 Developer Preview with Jetson Linux 34.1 Jetson Xavier NX	48	2906	May 25, 2022
OTA Update to JetPack 4.4 DP fails - error processing package nvidia-l4t-bootloader Jetson AGX Xavier nvbugs , ota	49	5592	October 18, 2021
JetPack 5.1.1 is now live Jetson Xavier NX	17	3846	August 3, 2023
JetPack 3.2 — L4T R28.2 Production Release for Jetson TX1/TX2 Jetson TX2	45	10208	July 22, 2018
Can't flash the TX2 with SDK manager Jetson TX2 sdkm	14	3300	October 18, 2021
JetPack 4.3 - L4T R32.3.1 released Jetson AGX Xavier	24	4428	April 22, 2020

Error when "import torch" is executed after python3

or if using ‘docker run’ (specify image and mounts/ect)

or explicitly specify one of the container images above

R35 (release), REVISION: 4.1, GCID: 33958178, BOARD: t186ref, EABI: aarch64, DATE: Tue Aug 1 19:57:35 UTC 2023

Related topics