torch.cuda.is_available()=FALSE and [INFO]: Driver not installed... but it IS installed!?

Hello,
I’m trying to use CUDA on my Jetson Orin NX, but after quite a bit of research, I still see that the output of torch.cuda.is_available() is FALSE, and I am unable to run samples successfully. For context, I have Ubuntu 20.04, Python 3.8, JetPack 5.1.2, and PyTorch 2.1.0a (I ensured that this is the compatible version). I attempted to install CUDA 11.4 using the following command: sudo sh cuda_11.4.0_470.42.01_linux_sbsa.run. I installed the Toolkit and the samples but did not install the drivers because I read that JetPack already includes them. Below are all the steps I’ve taken to verify the installation and their outputs:

  1. export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}, export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}, source ~/.bashrc.
  2. nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:41_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
  1. sudo tegrastats
10-10-2024 09:30:24 RAM 1194/15523MB (lfb 892x4MB) SWAP 0/7762MB (cached 0MB) CPU [1%@1984,0%@1984,0%@729,0%@729,0%@729,0%@729,0%@729,0%@729] EMC_FREQ 0%@2133 GR3D_FREQ 0%@[305,0] VIC_FREQ 729 APE 174 CV0@48.312C CPU@50.656C SOC2@49.031C SOC0@50.25C CV1@48.968C GPU@48.625C tj@51.875C SOC1@51.875C CV2@48.781C VDD_IN 4820mW/4820mW VDD_CPU_GPU_CV 607mW/607mW VDD_SOC 1498mW/1498mW
  1. lsmod | grep nvidia
nvidia_modeset       1093632  3
nvidia               1327104  7 nvidia_modeset
  1. Checked that /usr/local/cuda-11.4 exists and ls -l /usr/local/cuda returns lrwxrwxrwx 1 root root 21 oct 7 16:46 /usr/local/cuda -> /usr/local/cuda-11.4/
  2. /usr/lib/aarch64-linux-gnu/: Is a directory
  3. dpkg -l | grep libcudnn
ii  libcudnn8                                  8.6.0.166-1+cuda11.4                                                                                                                                            arm64        cuDNN runtime libraries
ii  libcudnn8-dev                              8.6.0.166-1+cuda11.4                                                                                                                                            arm64        cuDNN development libraries and headers
ii  libcudnn8-samples                          8.6.0.166-1+cuda11.4                                                                                                                                            arm64        cuDNN samples
  1. ls: cannot access '/usr/lib/aarch64-linux-gnu/libnvinfer*': No such file or directory
  2. ls /usr/local/cuda/lib64 | grep libcudart
libcudart.so
libcudart.so.11.0
libcudart.so.11.4.43
libcudart_static.a
  1. ls /usr/local/cuda/lib64 | grep libcuda
libcudadevrt.a
libcudart.so
libcudart.so.11.0
libcudart.so.11.4.43
libcudart_static.a

11.glxinfo | grep "OpenGL version" returns: OpenGL version string: 3.1 Mesa 21.2.6
12. cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX Open Kernel Module for aarch64  35.4.1  Release Build                                                                                                                    (buildbrain@mobile-u64-6422-d7000)  Tue Aug  1 12:45:41 PDT 2023
GCC version:  gcc version 9.3.0 (Buildroot 2020.08)
  1. sudo ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 100
-> no CUDA-capable device is detected
Result = FAIL
  1. cat /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)

[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: CUDA Samples 11.4
[WARNING]: Missing recommended library: libGLU.so
[WARNING]: Missing recommended library: libXmu.so
[INFO]: CUDA Documentation 11.4
[INFO]: Uninstall entry: dir /usr/local/cuda-11.4/
[INFO]: Uninstall entry: dir /usr/local/cuda-11.4/tools
[INFO]: Uninstall entry: dir /usr/local/cuda-11.4/bin
[INFO]: Uninstall entry: file /usr/local/cuda-11.4/DOCS eba0599f5ca8b9bda647f20a                                                                                                                  e3dd6345
[INFO]: md5 matches, removing file
[INFO]: Uninstall entry: file /usr/local/cuda-11.4/EULA.txt 1203d26f82bb4d2d485f                                                                                                                  e3be00efc3ae
[INFO]: md5 matches, removing file
[INFO]: Uninstall entry: file /usr/local/cuda-11.4/README 8514c9e2df920c17a65c54                                                                                                                  e30843d822
[INFO]: md5 matches, removing file
[INFO]: Uninstall entry: file /usr/local/cuda-11.4/bin/cuda-uninstaller c582e0ad                                                                                                                  d9ed4a47af8e4d2e2e698067
[INFO]: md5 matches, removing file
[INFO]: Uninstall entry: file /usr/local/cuda-11.4/tools/CUDA_Occupancy_Calculat                                                                                                                  or.xls 056066ac68c81aa2d3da4793112748cd
[INFO]: md5 matches, removing file
[INFO]: Removing empty directory: /usr/local/cuda-11.4/tools
[INFO]: Successfully created directory: /usr/local/cuda-11.4/tools
[INFO]: Installed: /usr/local/cuda-11.4/DOCS
[INFO]: Installed: /usr/local/cuda-11.4/EULA.txt
[INFO]: Installed: /usr/local/cuda-11.4/README
[INFO]: Installed: /usr/local/cuda-11.4/bin/cuda-uninstaller
[INFO]: Installed: /usr/local/cuda-11.4/tools/CUDA_Occupancy_Calculator.xls
[WARNING]: Cannot find manpages to install.
  1. cat /etc/X11/xorg.conf
# Copyright (c) 2011-2013 NVIDIA CORPORATION.  All Rights Reserved.

#
# This is the minimal configuration necessary to use the Tegra driver.
# Please refer to the xorg.conf man page for more configuration
# options provided by the X server, including display-related options
# provided by RandR 1.2 and higher.

# Disable extensions not useful on Tegra.
Section "Module"
    Disable     "dri"
    SubSection  "extmod"
        Option  "omit xfree86-dga"
    EndSubSection
EndSection

Section "Device"
    Identifier  "Tegra0"
    Driver      "nvidia"
# Allow X server to be started even if no display devices are connected.
    Option      "AllowEmptyInitialConfiguration" "true"
EndSection
  1. sudo dmesg | grep nvidia
[    0.002849] DTS File Name: /dvs/git/dirty/git-master_linux/kernel/kernel-5.10                                                                                                                  /arch/arm64/boot/dts/../../../../../../hardware/nvidia/platform/t23x/p3768/kerne                                                                                                                  l-dts/tegra234-p3767-0000-p3509-a02.dts
[   14.934640] nvidia: loading out-of-tree module taints kernel.
[   19.318537] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driv                                                                                                                  er for aarch64  35.4.1  Release Build  (buildbrain@mobile-u64-6422-d7000)  Tue A                                                                                                             ug  1 12:45:42 PDT 2023
  1. gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
  1. lspci | grep -i nvidia
0004:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)
0008:00:00.0 PCI bridge: NVIDIA Corporation Device 229c (rev a1)

I have no clue what the issue might be, so any input would be helpful. Thank you in advance!

Could you try installing CUDA following these instructions

Also please make sure you use a l4t version of pytorch jetson-containers/packages/l4t/l4t-pytorch at master · dusty-nv/jetson-containers · GitHub

Regards,
Allan Navarro
Embedded SW Engineer at RidgeRun

Contact us: support@ridgerun.com
Developers wiki: https://developer.ridgerun.com/
Website: www.ridgerun.com

Hi,

SBSA CUDA is for dGPU devices.
Please use the default CUDA version from JetPack.

Thanks.

Thank you so much for your responses, @AastaLLL and @allan.navarro!

I now understand the issue. The link provided in the instructions for installing CUDA mentions two methods. I attempted the first one, but it didn’t work. The second method states that the first step is to “copy the cuda-repo-l4t-11-4-local_11.4.14-1_arm64.deb file to the target Orin.” However, I’m unsure how to do this. I assume I need to use wget, but I don’t know the URL for that file.

Could you please guide me on how to find the correct URL, or suggest an alternative way to transfer the file to the target Orin? I appreciate your help!

Thank you!

Normally they are downloaded to your PC on ~/Downloads/nvidia if you used sdkmanager to install jetpack, if not you could try this link to explore the .debs available, I suggest you try some of those debs.

Regards,
Allan Navarro
Embedded SW Engineer at RidgeRun

Contact us: support@ridgerun.com
Developers wiki: https://developer.ridgerun.com/
Website: www.ridgerun.com