"torch.version.cuda" remains the same after manually upgrading CUDA to 11.8

Hi. I’m using Orin NX 16GB with JP 5.1.

I need pytorch 2.0+ to support my applications, and I find that pytorch 2.0 should be used with CUDA 11.7/11.8.

So I first follow the instructions here to install CUDA 11.8 on Orin NX, then here to install torch 2.0.0.

After all these work, the issue is when I do deviceQuery, it still displays cuda-11.4:

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Orin"
  CUDA Driver Version / Runtime Version          11.4 / 11.4
  CUDA Capability Major/Minor version number:    8.7
  Total amount of global memory:                 14485 MBytes (15188504576 bytes)
  (008) Multiprocessors, (128) CUDA Cores/MP:    1024 CUDA Cores
  GPU Max Clock rate:                            918 MHz (0.92 GHz)
  Memory Clock rate:                             918 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

Additionally, torch.version.cuda is still ‘11.4’. What should I do?

Thanks in advance.

root@nvidia-desktop:~# pip list | grep torch
torch                     2.0.0+nv23.5
root@nvidia-desktop:/usr/local/cuda-11.8# ll compat
total 69400
drwxr-xr-x  2 root root     4096 8月   1 12:28 .
drwxr-xr-x 13 root root     4096 8月   1 13:48 ..
lrwxrwxrwx  1 root root       12 8月  27  2022 libcuda.so -> libcuda.so.1
lrwxrwxrwx  1 root root       14 8月  27  2022 libcuda.so.1 -> libcuda.so.1.1
-rw-r--r--  1 root root 27547936 8月  27  2022 libcuda.so.1.1
-rw-r--r--  1 root root 24472968 8月  27  2022 libnvidia-nvvm.so
lrwxrwxrwx  1 root root       17 8月  27  2022 libnvidia-nvvm.so.4 -> libnvidia-nvvm.so
-rw-r--r--  1 root root 19030200 8月  27  2022 libnvidia-ptxjitcompiler.so
lrwxrwxrwx  1 root root       27 8月  27  2022 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so

Hi,

Please export the compat folder to use a newer CUDA version.

export PATH=/usr/local/cuda-12/bin:${PATH}
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/compat

But our PyTorch is built with default CUDA 11.4.
If you want to use a newer CUDA version, please build it from the source.

The instructions can be found in the below topic:

Thanks.

Thanks for reply.

After I run export commands, my CUDA driver version is now 11.8 but runtime version is 11.4.

Device 0: "Orin"
  CUDA Driver Version / Runtime Version          11.8 / 11.4
  CUDA Capability Major/Minor version number:    8.7
  Total amount of global memory:                 14485 MBytes (15188504576 bytes)
  (008) Multiprocessors, (128) CUDA Cores/MP:    1024 CUDA Cores
  GPU Max Clock rate:                            918 MHz (0.92 GHz)
  Memory Clock rate:                             918 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.8, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

Is this an expected behaviour?

Hi,

It should be both 11.8.

My previous command is for CUDA 12.
Have you updated the version to 11.8?

Thanks.

Yes. And here is the result:

root@nvidia-desktop:/usr/local/cuda-11.8/samples/1_Utilities/deviceQuery# export | grep PATH
declare -x LD_LIBRARY_PATH="/usr/local/cuda-11.8/compat"
declare -x PATH="/usr/local/cuda-11.8/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"

root@nvidia-desktop:/usr/local/cuda-11.8/samples/1_Utilities/deviceQuery# ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Orin"
  CUDA Driver Version / Runtime Version          11.8 / 11.4
  CUDA Capability Major/Minor version number:    8.7
  Total amount of global memory:                 14485 MBytes (15188504576 bytes)
  (008) Multiprocessors, (128) CUDA Cores/MP:    1024 CUDA Cores
  GPU Max Clock rate:                            918 MHz (0.92 GHz)
  Memory Clock rate:                             918 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        167936 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.8, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

Hi,

Could you run the following command again and share the log with us?

$ sudo apt-get -y -f reinstall cuda

More, our prebuilt is built and compatible with the JetPack environment.
So you don’t need to manually upgrade CUDA to 11.8.

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.