CUDA 7.5 not working on 980TI in LInux Mint 17.2

Hi everyone,

I successfully installed CUDA 7.5 on Linux Mint 17.2. When I try to use it in Theano, I get the following error:

WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu0 is not available  (error: Unable to get the number of gpus available: CUDA driver version is insufficient for CUDA runtime version)

Also, when I ran CUDA installation program, it said “You’re attempting to install on an unsupported configuration.” I installed it anyway.

My driver version is nvidia-352:

modinfo nvidia_352
filename:       /lib/modules/3.16.0-38-generic/kernel/drivers/char/drm/nvidia_352.ko
alias:          char-major-195-*
version:        352.63
supported:      external
license:        NVIDIA
...

I really don’t know what to do next … I read somewhere that this must be a problem with CUDA install, or that my driver doesn’t support the 980TI.
Any help would be greatly appreciated.

Thanks!

what is the result of running nvidia-smi ?

and what is the result of running deviceQuery?

these two:

nvidia-smi
Fri Dec 18 23:44:50 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  Off  | 0000:01:00.0      On |                  N/A |
|  0%   54C    P8    26W / 250W |    409MiB /  6143MiB |     13%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1484    G   /usr/bin/X                                     196MiB |
|    0      2769    G   cinnamon                                        65MiB |
|    0      3021    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd   124MiB |
+-----------------------------------------------------------------------------+

deviceQuery:

deviceQuery
deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

Thanks for help!

You have a corrupted driver install of some sort. My guess is that you had a previous package manager install, and then later you did a CUDA toolkit install, and/or a driver runfile install.

You can read the installation guide to learn about the issues associated with this. If you want to start over, start with a clean install of the OS. Then if you want CUDA 7.5 and driver 352.63, use the 352.63 runfile installer, then use the CUDA 7.5 runfile installer, but select no when prompted to install the driver provided by the toolkit (which is 352.39).

I’d really like to avoid reinstalling the OS, since I have many programs, data, etc.
Edit: also because I spent two days just getting sound to work because Mint’s audio drivers refused to work with my motherboard, so I spent two days rebooting and cursing before finally finding out what to do.
But I’m relatively new to Linux, so I really don’t know, maybe that’s the way to go - or it may be the way to go exactly because of that.

As you say, I did install first from installer file, then I did

sudo apt-get install nvidia-cuda-toolkit

Things started to get downhill from that point on.

So for another try at this, I read the Linux Getting Started guide and uninstalled CUDA, but couldn’t uninstall driver runfile with this:

sudo /usr/bin/nvidia-uninstall
sudo: /usr/bin/nvidia-uninstall: command not found

I also did this:

sudo apt-get uninstall nvidia-cuda-toolkit

Then I reinstalled CUDA 7.5 from the runfile.

Then I read the thread here: https://devtalk.nvidia.com/default/topic/760872/ubuntu-12-04-error-cudagetdevicecount-returned-30/

deviceQuery still says:

deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

Commands from the mentioned thread return these results:

nvidia-settings -q NvidiaDriverVersion
  Attribute 'NvidiaDriverVersion' (virostatiq:0.0): 352.63
  Attribute 'NvidiaDriverVersion' (virostatiq:0[gpu:0]): 352.63
uname -r
3.16.0-38-generic
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  352.63  Sat Nov  7 21:25:42 PST 2015
GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
lsmod | grep -i nvidia
nvidia_uvm             76757  0 
nvidia               8642894  59 nvidia_uvm
drm                   311018  5 i915,drm_kms_helper,nvidia

That below the reason I installed nvidia cuda toolkit via apt over runfile install the last time. Before that it returned something that I read it should, I forgot exactly why.

nvcc -V
The program 'nvcc' is currently not installed. You can install it by typing:
sudo apt-get install nvidia-cuda-toolkit
nvidia-smi
Fri Dec 25 15:49:01 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 352.63     Driver Version: 352.63         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980 Ti  Off  | 0000:01:00.0      On |                  N/A |
|  0%   51C    P8    17W / 250W |    307MiB /  6143MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      4157    G   /usr/bin/X                                     174MiB |
|    0      4336    G   cinnamon                                        41MiB |
|    0      4488    G   ...s-passed-by-fd --v8-snapshot-passed-by-fd    67MiB |
+-----------------------------------------------------------------------------+
cat /etc/ld.so.conf.d/nvidia-lib64.conf
cat: /etc/ld.so.conf.d/nvidia-lib64.conf: No such file or directory
sudo modinfo nvidia-352-uvm
filename:       /lib/modules/3.16.0-38-generic/kernel/drivers/video/nvidia_352_uvm.ko
supported:      external
license:        MIT
srcversion:     A347F556C35EE8E88DF9DEB
depends:        nvidia
vermagic:       3.16.0-38-generic SMP mod_unload modversions 
parm:           NVuvm_prefetch_stats:int
parm:           NVuvm_prefetch_threshold:int
parm:           NVuvm_prefetch_adaptive:int
parm:           NVuvm_prefetch_epoch:int
parm:           NVuvm_prefetch_sparsity_inc:int
parm:           NVuvm_prefetch_sparsity_dec:int
parm:           NVuvm_prefetch:int
sudo update-alternatives --config x86_64-linux-gnu_gl_conf
There are 3 choices for the alternative x86_64-linux-gnu_gl_conf (providing /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf).

  Selection    Path                                       Priority   Status
------------------------------------------------------------
  0            /usr/lib/nvidia-352/ld.so.conf              8604      auto mode
  1            /usr/lib/nvidia-352-prime/ld.so.conf        8603      manual mode
* 2            /usr/lib/nvidia-352/ld.so.conf              8604      manual mode
  3            /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf   500       manual mode

Contents of /usr/lib/nvidia-352-prime/ld.so.conf:

/usr/lib/x86_64-linux-gnu/mesa
/usr/lib/i386-linux-gnu/mesa

Contents of /usr/lib/nvidia-352/ld.so.conf:

/usr/lib/nvidia-352
/usr/lib32/nvidia-352

Is there anything I can do to avoid OS reinstall? I’m afraid wrong settings may now be in many files, and short of finding them all it’s impossible, but I’m willing to try.
I’d be very grateful for any help.

Thanks,

Marko

Yes, that is a no-no.

If you want to unravel it without an OS reinstall, try to clean up the install using the methods described in the installation guide:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#handle-uninstallation

Pick a method that you would like to use moving forward: either package manager method, or runfile installer method. Use that method and only that method moving forward, whether it is for toolkit installs or driver installs. Both methods are covered in the installation guide.

Thanks. I’m probably going to have to reinstall the OS, but I’d like an opinion first. When I first installed CUDA 7.5, from installer (no apt install present then), it warned me that I’m attempting to install on an unsupported configuration. The only NVIDIA software I had then was the display driver.

How can I find out what exactly is unsupported? Is it possible that it’s a hardware component, because the computer is new? As I said, I’ll resign myself to reinstalling, or upgrading to Mint 17.3, but what if I get that exact warning on a clean install then? It’ll all be in vain.

Thanks!

I would start by reading the linux installation guide I’ve already linked. Refer to section 1.1 and Table 1. Linux Mint 17 is not listed as a supported OS/Distro.

Having said that, I frequently use Fedora 20 with CUDA 7.5, although it’s also not listed as a supported distro (and I get that unsupported message). In my opinion, it works just fine, at least for me and my purposes.

And in any event I don’t know the heuristic behind that message.

I got it to work. Reinstalled the OS (Linux Mint 17.3), then installed CUDA with package manager immediately after booting the system for the first time. I didn’t install Nvidia display driver separately. It all worked relatively out of the box.

Theano was still giving an error, so I had to do

sudo ldconfig /usr/local/cuda-7.5/lib64

Then it started to work.

Thanks lxbob for encouraging me to go with reinstall.