Nvaccelinfo can't find GPU, but lspci can

Mat, I may still have PATH problems. See:

malcolm73 nvaccelinfo -v

CUDA Driver Version: 12030
could not initialize CUDA runtime, error code=100
No accelerators found.
Check the permissions on your CUDA device
malcolm74 lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GP100GL [Quadro GP100] (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 0fb1 (rev a1)
malcolm75

The Makefile and two .bashrc files are the same as the ones that you already have, so no change there!

Malcolm

If you run the command as “root”, does it work?

What’s the output from the following commands:

  • nvidia-smi
  • ls -l /dev/nv*
  • lsmod | grep nvidia
  • lsmod | grep nouveau

Tnx Mat. here are the results:

malcolm75 nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

malcolm76 ls -l /dev/nv*
crw-rw-rw-. 1 root root 195, 255 Jan 5 12:28 /dev/nvidiactl
crw-------. 1 root root 240, 0 Jan 4 23:27 /dev/nvme0
brw-rw----. 1 root disk 259, 0 Jan 4 23:27 /dev/nvme0n1
brw-rw----. 1 root disk 259, 1 Jan 4 23:27 /dev/nvme0n1p1
brw-rw----. 1 root disk 259, 2 Jan 4 23:27 /dev/nvme0n1p2
brw-rw----. 1 root disk 259, 3 Jan 4 23:27 /dev/nvme0n1p3
crw-------. 1 root root 240, 1 Jan 4 23:27 /dev/nvme1
brw-rw----. 1 root disk 259, 4 Jan 4 23:27 /dev/nvme1n1
crw-------. 1 root root 10, 144 Jan 4 23:27 /dev/nvram
malcolm77 lsmod | grep nvidia
malcolm78 lsmod | grep nouveau
malcolm79
I am very surprised about smi result, as I had run that before and it seemed fine. I guess it isn’t now! Malcolm

I suspect you don’t have a CUDA driver installed or there was a problem during installation.

Can you try installing a CUDA driver? Official Drivers | NVIDIA

Mat, here’s what I tried:

malcolm97 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0
malcolm98

I don’t see the driver version in there, just as it was missing from
nvidia-smi. I know it was installed - quite a high number.

On the link that you provided, there is no option for Cuda 12.3, which
is what I have installed. So no real info on what the driver version
should be!

Malcolm

nvcc is the CUDA compiler, not the driver.

You need to install the CUDA driver separately by downloading it from the link I provided above.

Mat, as I said, the link does not include info for Cuda 12.3. For 12.2 it recommends 535.129.03. Other Nvidia documentation states flatly 545.23.08. Where do I download this from, and where do I save it to?

Tnx. Malcolm.

Sorry I’m not clear what the issue is. As long as the CUDA Driver is newer than the CUDA SDK you’re compiling with, then you’re fine. The driver is backwards compatible except for very old devices.

Though if there is a reason why you need an older driver, the archive page can be found at: Official Advanced Driver Search | NVIDIA

Mat, I need to download 535.129.03. Where do I find it? Then which directory do I install it in? Malcolm.

Correction: I need 545.23.08. Sorry for the mistake. Malcolm

Mat, the 12.3 Update Release Notes clearly say that 545.23.08 is needed. Actually, from another forum, the correct number is 545.23.06. I have found it on the Nvidia site, and installed it. I can now use nvaccelinfo, as shown below. I wonder whether this was related to earlier problem with QD? Malcolm.

malcolm189 nvaccelinfo

CUDA Driver Version: 12030
NVRM version: NVIDIA UNIX x86_64 Kernel Module 545.23.06 Sun Oct 15 17:43:11 UTC 2023

Device Number: 0
Device Name: Quadro GP100
Device Revision Number: 6.0
Global Memory Size: 17064263680
Number of Multiprocessors: 56
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 65536
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 2147483647 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1442 MHz
Execution Timeout: No
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 715 MHz
Memory Bus Width: 4096 bits
L2 Cache Size: 4194304 bytes
Max Threads Per SMP: 2048
Async Engines: 2
Unified Addressing: Yes
Managed Memory: Yes
Concurrent Managed Memory: Yes
Preemption Supported: Yes
Cooperative Launch: Yes
Default Target: cc60
malcolm190

It says that 545.23.08 is the minimum needed so more recent drivers such as the current 535.146.02 can be used as well. Though I’m glad it’s working for you.

I wonder whether this was related to earlier problem with QD?

Still unclear what was causing the CUDAROOT issue but given before you installed the drivers the CUDA version was shown in nvaccelinfo (then failed) might be an indication that the stub libcuda.so was getting picked up. (libcuda.so is the CUDA driver, and the stub library is only used to resolve symbols during linking but has no functionality).