I am trying to update my CUDA installation from 10.0 (where everything worked well) to 10.2 on openSUSE Tumbleweed. The NVIDIA GPU is a secondary GPU, not driving the display. I use the official Tumbleweed RPMs for the NVIDIA GPU drivers.
I followed the official CUDA installation guide. Everything went well so far: I updated the NVIDIA drivers, installed CUDA 10.2 from the runfile (choosing not to install the driver from the CUDA installer, as I already have it from the RPMs), and nvidia-smi shows that everything is in order:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 660 Off | 00000000:02:00.0 N/A | N/A |
| 32% 45C P0 N/A / N/A | 0MiB / 1999MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
I compiled the CUDA samples (using g+±7 as the compiler, the default is g+±9) successfully as well, so nvcc works well too. However, now that I try to list the GPUs, the result is error 999:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 999
-> unknown error
Result = FAIL
And the same error is given by anything that tries to interface with CUDA at runtime. Any ideas as to how to debug this issue?
I am having the exact same problem with CUDA 10.2 on opensuse tumbleweed. Using the latest nvidia drivers 440.82. I compiled the CUDA samples with gcc-7 which worked. When calling ./deviceQuery I get the same error message.
However, I can run ./deviceQuery as root and after I doing this once it also works as a normal user. And magically, all of a sudden blender finds my CUDA device and everything is fine until next reboot. Then I have to run ./deviceQuery as root again to get CUDA to recognize my GPU.
I had the same problem and running as root solved it just as you said! I think this happened as a consequence of some updates I’ve done recently of some Nvidia libraries and software but I can’t be sure.