bandwidthTest example throws cudaErrorCallRequiresNewerDriver error when launched via nv-nsight-cu-cli

Consider the bandwidthTest example from CUDA samples. It works as expected when compiled and launched normally.

$ /usr/local/cuda-12.3/bin/nvcc bandwidthTest.cu -o bandwidthTest

$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Tesla P40
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     11.8

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     13.2

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)        Bandwidth(GB/s)
   32000000                     283.9

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

However, it doesn’t work under Nsight Compute. Note that I have to use a standalone install of Nsight Compute 2019.5 because it’s the last version that supports Tesla P40 GPUs.

$ /usr/local/NVIDIA-Nsight-Compute-2019.5/nv-nsight-cu-cli ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

==PROF== Connected to process 8325 (/data/research/cuda-playground/polar/bandwidthTest)
cudaGetDeviceProperties returned 36
-> API call is not supported in the installed CUDA driver
CUDA error at bandwidthTest.cu:256 code=36(cudaErrorCallRequiresNewerDriver) "cudaSetDevice(currentDevice)"
==PROF== Disconnected from process 8325
==ERROR== The application returned an error code (1)
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option

Some system information

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

$ sudo ubuntu-drivers debug
...... (verbose output omitted)
=== matching driver packages ===
nvidia-driver-525: installed: 525.147.05-0ubuntu0.22.04.1   available: 525.147.05-0ubuntu0.22.04.1 (auto-install)  [distro]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]  support level: PB
nvidia-driver-390: installed: <none>   available: 390.157-0ubuntu0.22.04.2  [distro]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]  support level: Legacy
nvidia-driver-545: installed: <none>   available: 545.23.08-0ubuntu1  [third party]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]
nvidia-driver-525-server: installed: <none>   available: 525.147.05-0ubuntu0.22.04.1  [distro]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]  support level: PB
nvidia-driver-535: installed: <none>   available: 535.86.10-0ubuntu1  [third party]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]
nvidia-driver-450-server: installed: <none>   available: 450.248.02-0ubuntu0.22.04.1  [distro]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]  support level: LTSB
nvidia-driver-470: installed: <none>   available: 470.223.02-0ubuntu0.22.04.1  [distro]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]  support level: LTSB
nvidia-driver-470-server: installed: <none>   available: 470.223.02-0ubuntu0.22.04.1  [distro]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]  support level: LTSB
nvidia-driver-418-server: installed: <none>   available: 418.226.00-0ubuntu5~0.22.04.1  [distro]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]  support level: LTSB
nvidia-driver-535-server: installed: <none>   available: 535.129.03-0ubuntu0.22.04.1  [distro]  non-free  modalias: pci:v000010DEd00001B38sv000010DEsd000011D9bc03sc02i00  path: /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0  vendor: NVIDIA Corporation  model: GP102GL [Tesla P40]  support level: PB

$ nvidia-smi
Tue Jan  9 18:18:57 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P40           On   | 00000000:04:00.0 Off |                  Off |
| N/A   13C    P8     8W / 250W |      2MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P40           On   | 00000000:42:00.0 Off |                  Off |
| N/A   14C    P8     8W / 250W |      2MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

$  cat /proc/driver/nvidia/version                                                            (two_bit_quant)
NVRM version: NVIDIA UNIX x86_64 Kernel Module  525.147.05  Wed Oct 25 20:27:35 UTC 2023
GCC version:  gcc version 12.3.0 (Ubuntu 12.3.0-1ubuntu1~22.04)

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:16:49_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0

$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "Tesla P40"
  CUDA Driver Version / Runtime Version          12.0 / 12.3
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 24446 MBytes (25632964608 bytes)
  (030) Multiprocessors, (128) CUDA Cores/MP:    3840 CUDA Cores
  GPU Max Clock rate:                            1531 MHz (1.53 GHz)
  Memory Clock rate:                             3615 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla P40"
  CUDA Driver Version / Runtime Version          12.0 / 12.3
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 24446 MBytes (25632964608 bytes)
  (030) Multiprocessors, (128) CUDA Cores/MP:    3840 CUDA Cores
  GPU Max Clock rate:                            1531 MHz (1.53 GHz)
  Memory Clock rate:                             3615 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 66 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla P40 (GPU0) -> Tesla P40 (GPU1) : No
> Peer access from Tesla P40 (GPU1) -> Tesla P40 (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.0, CUDA Runtime Version = 12.3, NumDevs = 2
Result = PASS

nvidia-bug-report.log.gz (545.9 KB)

How do I fix this so that I can instrument my code with nv-nsight-cu-cli?

Hi, @qingyao

→ API call is not supported in the installed CUDA driver
This error indicates that this version Nsight Compute is not compatible with the driver.
Nsight Compute 2019.5 is released together with CUDA 10.2. Can you please install the driver in CUDA10.2 package ?

Thanks for the reply. Can you specify the exact driver version that is packed with CUDA 10.2? According to Table 2. CUDA Toolkit 10.x Minimum Required Driver Versions, any driver version >= 440.33 should be compatible, which includes version 545.23.08 that I’m currently using. I don’t think I should install driver version 440, because the text under Table 3 on the same page notes that “Branches R515, R510, R465, R460, R455, R450, R440, R418, R410, R396, R390 are end of life and are not supported targets for compatibility.”

Additionally, I cannot just install CUDA 10.2, because the download page only contains installers that support Ubuntu 16.04 and 18.04, but I am using Ubuntu 22.04.

Here are the drivers available to me. You may want to pick one from the list, or point me to a URL from which I can download alternative drivers.

$ sudo ubuntu-drivers list
nvidia-driver-525-server, (kernel modules provided by linux-modules-nvidia-525-server-generic-hwe-22.04)
nvidia-driver-525, (kernel modules provided by linux-modules-nvidia-525-generic-hwe-22.04)
nvidia-driver-470-server, (kernel modules provided by linux-modules-nvidia-470-server-generic-hwe-22.04)
nvidia-driver-470, (kernel modules provided by linux-modules-nvidia-470-generic-hwe-22.04)
nvidia-driver-535, (kernel modules provided by linux-modules-nvidia-535-generic-hwe-22.04)
nvidia-driver-418-server, (kernel modules provided by nvidia-dkms-418-server)
nvidia-driver-545, (kernel modules provided by nvidia-dkms-545)
nvidia-driver-390, (kernel modules provided by nvidia-dkms-390)
nvidia-driver-535-server, (kernel modules provided by linux-modules-nvidia-535-server-generic-hwe-22.04)
nvidia-driver-450-server, (kernel modules provided by nvidia-dkms-450-server)

Can you please try with .run installer in CUDA Toolkit 10.2 Download | NVIDIA Developer to see if it can be installed on your machine?

I got the following error while installing CUDA 10.2.

[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

[ERROR]: unsupported compiler version: 11.4.0. Use --override to override this check.

To fix that, I installed GCC 8 on Ubuntu 22.04 according to this answer, only to get another error

[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 8.4.0 (Ubuntu 8.4.0-3ubuntu2)

[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 440.33.01
[INFO]: Executing NVIDIA-Linux-x86_64-440.33.01.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd  2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 440.33.01 failed, quitting

The error message is too ambiguous for me to do anything. Any advice on how to proceed?

For the record, I would rather we fix the driver issue on CUDA 12.3 than try to install CUDA 10.2. The latter is hopelessly outdated, and the bundled driver version 440.33.01 has already reached its end of life. I wish I could just install CUDA 12.3 normally. Newer drivers are supposed to work with old CUDA versions, aren’t they?

I have reinstalled CUDA 12.3. Current system information: nvidia-bug-report.log.gz (544.3 KB)

Now, here is an observation that may help you debug the issue: when the source filename ends with *.cpp, then everything works fine:

$ nvcc -o deviceQuery deviceQuery.cpp

$ ./deviceQuery | tail
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 66 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla P40 (GPU0) -> Tesla P40 (GPU1) : No
> Peer access from Tesla P40 (GPU1) -> Tesla P40 (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.3, CUDA Runtime Version = 12.3, NumDevs = 2
Result = PASS

$ /usr/local/NVIDIA-Nsight-Compute-2019.5/nv-nsight-cu-cli ./deviceQuery | tail
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla P40 (GPU0) -> Tesla P40 (GPU1) : No
> Peer access from Tesla P40 (GPU1) -> Tesla P40 (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.3, CUDA Runtime Version = 12.3, NumDevs = 2
Result = PASS
==PROF== Connected to process 6111 (/data/research/cuda-playground/polar/deviceQuery)
==PROF== Disconnected from process 6111
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option

However, if I rename the file to *.cu without changing its content, it no longer works with nv-nsight-cu-cli (“API call is not supported in the installed CUDA driver”).

$ nvcc -o deviceQuery deviceQuery.cu

$ ./deviceQuery | tail
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 66 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla P40 (GPU0) -> Tesla P40 (GPU1) : No
> Peer access from Tesla P40 (GPU1) -> Tesla P40 (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.3, CUDA Runtime Version = 12.3, NumDevs = 2
Result = PASS

$ /usr/local/NVIDIA-Nsight-Compute-2019.5/nv-nsight-cu-cli ./deviceQuery | tail
 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 36
-> API call is not supported in the installed CUDA driver
Result = FAIL
==PROF== Connected to process 6235 (/data/research/cuda-playground/polar/deviceQuery)
==PROF== Disconnected from process 6235
==ERROR== The application returned an error code (1)
==WARNING== No kernels were profiled
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option

I hope this can convince you that it is possible to make nv-nsight-cu-cli from Nsight Compute 2019.5 work with the latest driver, so it should not be necessary to install the driver associated with CUDA 10.2

@qingyao

Thanks for the investigation.
It is true that Nsight Compute 2019.5 can not work with driver R525 and R545. I am afraid this is expected as some API must have been dropped in these drivers.

But we tried internally and found R470 can work with 2019.5. And this is a branch in maintenance.
So I would suggest you to use latest R470 to have a try. Data Center Driver for Linux x64 | 470.223.02 | Linux 64-bit | NVIDIA

1 Like

Sounds great! However, I seem to have a partial install of CUDA 11.8, which gives me a warning message

Existing package manager installation of the driver found. It is strongly recommended that you remove this before continuing.
Abort
Continue

I executed the following command, but I still got the same warning message, presumably because I installed CUDA 11.8 with the runfile instead of a local deb file.

sudo apt purge "nvidia*" "cuda*" -y
sudo apt remove "nvidia-*" -y
sudo rm "/etc/apt/sources.list.d/cuda*"
sudo apt autoremove -y && sudo apt autoclean -y
sudo rm -rf "/usr/local/cuda*"

May I ask how to completely uninstall that “existing package manager installation of the driver” before I can proceed?

Please refer CUDA Installation Guide for Linux

Thanks, I have successfully installed driver version 470.223.02. I then proceeded to install CUDA version 11.8 (deselecting driver installation so that 470.223.02 won’t get overwritten), because according to this CUDA 11.8.x works with driver >=450.80.02.

However, I got the following message from the CUDA 11.8 installer, which says driver >= 520.00 is required. Doesn’t it contradict the document linked above?

$ sudo sh cuda_11.8.0_520.61.05_linux.run
===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-11.8/

Please make sure that
 -   PATH includes /usr/local/cuda-11.8/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.8/lib64, or, add /usr/local/cuda-11.8/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.8/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 520.00 is required for CUDA 11.8 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

I ignored the warning, rebooted the machine, and recompiled bandwidthTest, but it did not work

./bandwidthTest --device=all
[CUDA Bandwidth Test] - Starting...

!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!

Running on...

 Device 0: Tesla P40
 Device 1: Tesla P40
 Quick Mode

CUDA error at bandwidthTest.cu:686 code=222(cudaErrorUnsupportedPtxVersion) "cudaEventCreate(&start)"

Meanwhile, deviceQuery said something like the following

$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "Tesla P40"
  CUDA Driver Version / Runtime Version          11.4 / 11.8
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 24452 MBytes (25639649280 bytes)
  (030) Multiprocessors, (128) CUDA Cores/MP:    3840 CUDA Cores
  GPU Max Clock rate:                            1531 MHz (1.53 GHz)
  Memory Clock rate:                             3615 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 4 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla P40"
  CUDA Driver Version / Runtime Version          11.4 / 11.8
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 24452 MBytes (25639649280 bytes)
  (030) Multiprocessors, (128) CUDA Cores/MP:    3840 CUDA Cores
  GPU Max Clock rate:                            1531 MHz (1.53 GHz)
  Memory Clock rate:                             3615 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 66 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla P40 (GPU0) -> Tesla P40 (GPU1) : No
> Peer access from Tesla P40 (GPU1) -> Tesla P40 (GPU0) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.8, NumDevs = 2
Result = PASS

I’m not sure why it says “CUDA Driver Version = 11.4”. Probably because CUDA 11.4 is bundled with driver version R470? Do I have to install CUDA 11.4 instead of CUDA 11.8? That would be tricky because it does not support Ubuntu 22.04.

Here is the latest debug report:
nvidia-bug-report.log.gz (525.8 KB)

You have Nsight Compute 2019.5 and driver 470 now. Why you want to install CUDA11.8 ?

I need access to the nvcc compiler to compile stuff and a CUDA runtime to run the programs. I downloaded Nsight Compute 2019.5 separately from this link, not as a part of another CUDA installation, so I need to install some CUDA toolkit that works with driver 470.

I chose CUDA 11.8 because it’s the latest CUDA that works with driver 470. I know Nsight Compute 2019.5 is bundled with CUDA 10.2, but it didn’t install as I said before.

Got it. Yes, you need to firstly make sure that your sample can work with the driver.

Yes. Do you have a recommended CUDA version for driver 470? As I stated above, CUDA 11.8 doesn’t work even though your document claims it is compatible with driver 470.

Sorry. The combination works in our env: Nsight Compute 2019.5 + CUDA 11.8 cuda samples built by CUDA11.8 compiler +470 driver.

cudaErrorUnsupportedPtxVersion: Maybe you can double check if your compile command contains target GPU arch.

Oh, this is fantastic. Thank you! Everything works after I pass -arch sm_61 to nvcc for P40 GPUs. Here is the final system information for future reference: nvidia-bug-report.log.gz (498.4 KB)

Glad to know everything works now !

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.