seems like cuda can't recognize my device

user2717954 · October 14, 2017, 7:10pm

hi

I’m trying to install cuda 8 on my machine but I can’t run any sample.

Basically I followed the guide found at https://www.nvidia.com/en-us/data-center/gpu-accelerated-applications/tensorflow/

everything seemed to work find (i.e no errors during installations etc)

nvidia-smi shows that device is recognized by my machine/driver

$ nvidia-smi
Sat Oct 14 22:04:03 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.82                 Driver Version: 375.82                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 0000:01:00.0     Off |                  N/A |
| N/A   38C    P0    N/A /  N/A |      0MiB /  4044MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

but when trying to run the nbody sample

$ ./nbody

Error: only 0 Devices available, 1 requested.  Exiting.

(not sure what other info I can give that will help identify the problem. when I tried running the vectorAdd sample I got a slightly different error:

Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
)

any help will be appreciated.

thanks

njuffa · October 14, 2017, 8:30pm

As the error message says: The installed CUDA driver is too old. Install the latest available driver for your platform.

user2717954 · October 14, 2017, 9:12pm

I might not fully understand the terminology used so forgive me if my question is stupid.

I believe some of the applications I want to use are not supported by cuda 9 yet
do you mean I need to install cuda 9? or are is there some latest available driver also for cuda 8? if the later then how can I find out the driver I currently have installed? also do notice I strictly followed the guide found on this site

what is the difference between CUDA driver version and CUDA runtime version? how do I verify that they match?

thanks

njuffa · October 14, 2017, 9:32pm

Software is typically layered, arranged as a “stack” (imagine a stack of pancakes). Code higher up in the stack makes calls to code lower in the stack, which in turn calls code even lower in the stack, … you get the picture.

In this case, the CUDA runtime sits on top of the CUDA driver in the stack. Your CUDA-accelerated applications sit, in all likelihood, on top of the CUDA runtime.

A particular version of the CUDA runtime will require a certain minimum version of the CUDA driver, and complain if that is not in place (see error message above). It will also operate with versions of the driver newer than the minimum required one. So generally the recommendation is to simply use the latest available driver. You can download drivers here:

http://www.nvidia.com/Download/index.aspx?lang=en-us

user2717954 · October 15, 2017, 5:57pm

and what about this comment in the installation guide?

$ sudo apt-get install nvidia- (press tab to see latest). 375 (do not use 378, may cause login loops)

thanks

njuffa · October 15, 2017, 6:02pm

What about it? It is better to ask specific questions. BTW, what installation guide is this?

user2717954 · October 15, 2017, 6:20pm

you recommend to update the driver. however I’m using a specific driver recommended by nvidia itself with the cuda runtime version recommended in the same place. I had the login loop issue before and it was pretty annoying to solve so I would like to avoid risky driver versions

this is the installation guide: TensorFlow | NVIDIA NGC

user2717954 · October 15, 2017, 6:21pm

sorry. double post

njuffa · October 15, 2017, 6:34pm

The document you point to seems to have been last updated for CUDA 8. So obviously it talks about drivers available around the time CUDA 8 became available, and it seems that one particular version, 378, caused problems at that time, thus the comment says to avoid it. Since then driver versions have progressed to 385 or thereabouts.

The interaction between the CUDA runtime and the CUDA driver is such that each version of the runtime requires a certain minimum driver version, and it complains if that is not in place, as you have seen. But it is generally fine to use newer drivers than the minimum required, with (rare) exceptions where newer drivers introduce a serious bug not present in older ones.

I have no idea how you installed your system. If you installed the complete CUDA 8.0 package, it should have come with a matching CUDA driver and you shouldn’t be seeing the error message you reported encountering in your original post. Your current installation may be incomplete or corrupted; I do not know of a way to diagnose that remotely.

Maybe txbob will come along and have more specific advice for you, he is more of a Linux installation expert than I am. There is certainly no issue with the device: GeForce GTX 960M is a Maxwell-family GPU with compute capability 5.0, and I would expect it to be a number of years before CUDA drops support for that.

Robert_Crovella · October 15, 2017, 8:35pm

You have a broken driver install. The “driver version is insufficient for CUDA runtime version” is a very solid, reliable indicator of that. Not sure what else there is to say. The fact that nvidia-smi gives typical output is, unfortunately, not a guarantee that the driver is installed correctly for CUDA. It is normally a good indicator, but it is not conclusive. The NVIDIA driver structure that supports GPU computing involves multiple linux modules, and it is possible that enough of these modules are “harmonized” so that nvidia-smi will work, but some other aspect is not. This is occasionally the outcome when people have struggled with the driver install. Trying multiple different things in sequence from an otherwise clean config is a recipe for disaster.

Start over with a clean install of linux. Then follow the instructions in the linux install guide (for CUDA 8, if you wish). It will work. You also evidently have availed yourself of instructions for avoiding the login-loop. Avail yourself of those, again, as you start with a pristine install.

If you prefer, follow the instructions for cleanup of previous installs contained in the aforementioned install guide. It may work. But the complexity of the driver combined with the myriad possibilities that people may have performed previously as part of the “history” of their machines makes it impossible IMO to provide a concise set of recovery/cleanup instructions that are guaranteed to work in every case, with the exception of “Start over with a clean install,… follow the instructions in the linux install guide precisely…, etc.”