cudaGetDeviceCount returned 803

liu.jialu · September 5, 2025, 7:22am

According to the page: CUDA Setup — Jetson AGX Thor Developer Kit - User Guide

cd ~
mkdir -p $HOME/cuda-work && cd $HOME/cuda-work
docker run --rm -it \
    -v "$PWD":/workspace \
    -w /workspace \
    nvcr.io/nvidia/cuda:13.0.0-devel-ubuntu24.04

in docker:

apt update && apt install -y --no-install-recommends git make cmake
git clone --depth=1 --branch v13.0 https://github.com/NVIDIA/cuda-samples.git
cd cuda-samples/Samples/1_Utilities/deviceQuery
cmake . -DGPU_TARGETS=all -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)
./deviceQuery

report error:

root@87473f37b836:/workspace/cuda-samples/Samples/1_Utilities/deviceQuery# ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 803
→ system has unsupported display driver / cuda driver combination
Result = FAIL

DaneLLL · September 5, 2025, 7:48am

Hi,
Please refer to
Making sure you're not a bot!

And ensure you an run nvidia-smi to get the information. And then deviceQuery is supposed to work.

liu.jialu · September 5, 2025, 8:20am

root@57a72c190374:/workspace# nvidia-smi
Fri Sep 5 08:12:57 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.00 Driver Version: 580.00 CUDA Version: 13.0 |
±----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA Thor Off | 00000000:01:00.0 Off | N/A |
| N/A N/A N/A N/A / N/A | Not Supported | 30% Default |
| | | Disabled |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
±----------------------------------------------------------------------------------------+

but deviceQuery still can’t work.

DaneLLL · September 8, 2025, 5:20am

Hi,
Please follow the commands and see if it works:

nvidia@tegra-ubuntu:~/cudaSample$ export PATH=/usr/local/cuda-13.0/bin:$PATH
nvidia@tegra-ubuntu:~/cudaSample$ export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH
nvidia@tegra-ubuntu:~/cudaSample$ git clone -b v13.0 https://github.com/NVIDIA/cuda-samples
Cloning into 'cuda-samples'...
remote: Enumerating objects: 30467, done.
remote: Counting objects: 100% (14707/14707), done.
remote: Compressing objects: 100% (1489/1489), done.
remote: Total 30467 (delta 13847), reused 13221 (delta 13218), pack-reused 15760
 (from 2)
Receiving objects: 100% (30467/30467), 135.82 MiB | 1.55 MiB/s, done.
Resolving deltas: 100% (26511/26511), done.
Note: switching to '3f1c50965017932fc81e6d94a3fc9e04c105b312'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

nvidia@tegra-ubuntu:~/cudaSample$ cd cuda-samples/
nvidia@tegra-ubuntu:~/cudaSample/cuda-samples$ cd Samples/1_Utilities/deviceQuery
nvidia@tegra-ubuntu:~/cudaSample/cuda-samples/Samples/1_Utilities/deviceQuery$mkdir b
nvidia@tegra-ubuntu:~/cudaSample/cuda-samples/Samples/1_Utilities/deviceQuery$cd b
nvidia@tegra-ubuntu:~/cudaSample/cuda-samples/Samples/1_Utilities/deviceQuery/b$ cmake ..
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- The CUDA compiler identification is NVIDIA 13.0.48
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-13.0/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-13.0/targets/sbsa-linux/include (found version "13.0.48")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done (1.7s)
-- Generating done (0.0s)
-- Build files have been written to: /home/nvidia/cudaSample/cuda-samples/Samples/1_Utilities/deviceQuery/b
nvidia@tegra-ubuntu:~/cudaSample/cuda-samples/Samples/1_Utilities/deviceQuery/b$ make
[ 50%] Building CXX object CMakeFiles/deviceQuery.dir/deviceQuery.cpp.o
[100%] Linking CXX executable deviceQuery
[100%] Built target deviceQuery
nvidia@tegra-ubuntu:~/cudaSample/cuda-samples/Samples/1_Utilities/deviceQuery/b$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Thor"
  CUDA Driver Version / Runtime Version          13.0 / 13.0
  CUDA Capability Major/Minor version number:    11.0
  Total amount of global memory:                 125772 MBytes (131881820160 byt
es)
  (020) Multiprocessors, (128) CUDA Cores/MP:    2560 CUDA Cores
  GPU Max Clock rate:                            1049 MHz (1.05 GHz)
  Memory Clock rate:                             0 Mhz
  Memory Bus Width:                              0-bit
  L2 Cache Size:                                 33554432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536)
, 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        233472 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 13.0, CUDA Runtime Version = 13.0, NumDevs = 1
Result = PASS

liu.jialu · September 8, 2025, 9:26am

it works locally, but still can’t work in docker.

AastaLLL · September 9, 2025, 12:55am

Hi,

The driver in the container (580.65.06) doesn’t support Thor.

You can run the container with --runtime nvidia and export the local driver instead.

$ sudo docker run --rm -it -v "$PWD":/workspace -w /workspace --runtime nvidia nvcr.io/nvidia/cuda:13.0.0-devel-ubuntu24.04

# export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/nvidia:$LD_LIBRARY_PATH
# cd cuda-samples/Samples/1_Utilities/deviceQuery
# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA Thor"
  CUDA Driver Version / Runtime Version          13.0 / 13.0
  CUDA Capability Major/Minor version number:    11.0
  Total amount of global memory:                 125772 MBytes (131881820160 bytes)
  (020) Multiprocessors, (128) CUDA Cores/MP:    2560 CUDA Cores
  GPU Max Clock rate:                            1049 MHz (1.05 GHz)
  Memory Clock rate:                             0 Mhz
  Memory Bus Width:                              0-bit
  L2 Cache Size:                                 33554432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        233472 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 13.0, CUDA Runtime Version = 13.0, NumDevs = 1
Result = PASS

Thanks.

system · September 23, 2025, 12:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
WARNING: The NVIDIA Driver was not detected Jetson Thor	8	482	September 15, 2025
Help for "cudaGetDeviceCount returned 38" after ./deviceQuery CUDA Setup and Installation	7	5222	November 14, 2017
Could not run sample deviceQuery application inside docker Container: CUDA	2	1977	January 11, 2020
CUDA insufficient driver version Jetson TX1	9	8345	October 18, 2021
Thor CUDA available: False Jetson Thor cuda	6	534	September 4, 2025
Cuda error CUDA Programming and Performance cuda	2	26	November 11, 2025
CUDA/deviceQuery only possible with sudo CUDA Setup and Installation	1	800	November 19, 2018
Unable to run Nvidia cuda sample app (./deviceQuery) inside container Jetson TX2 cuda , docker	3	950	October 18, 2021
RuntimeError: CUDA error: no kernel image is available for execution on the device Linux	29	82611	February 22, 2021
Using CUDA in l4t-cuda Docker container Jetson TX2 cuda , ubuntu , docker	3	1672	May 4, 2022

cudaGetDeviceCount returned 803

Related topics