Samples 7.5 - Driver 364.19 - Runtime Error: Insufficient Driver

I recently installed CUDA samples, version 7.5, vs. the following:

OS: Ubuntu 14.04
GPU: GTX 660
GPU Driver: 364.19

Where I had to force the compile of the samples by creating a softlink from /usr/lib/nvidia-364 to /usr/lib/nvidia-352, this worked. (I include this to be complete.)

But when I try to execute any sample, CUDA throws the following error:

== RUNTIME ERROR ==
CUDA error at …/…/common/inc/helper_cuda.h:1111 code=35(cudaErrorInsufficientDriver) “cudaGetDeviceCount(&device_count)”
== END ==

While nvidia-smi returns:
±-----------------------------------------------------+
| NVIDIA-SMI 364.19 Driver Version: 364.19 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 660 Off | 0000:01:00.0 N/A | N/A |
| 36% 50C P0 N/A / N/A | 256MiB / 2042MiB | N/A Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
±----------------------------------------------------------------------------+

Does anyone have an idea why the runtime would fail on this error when my driver is clearly sufficient?

thanks,

richard

Probably your driver is not correctly installed. I can’t tell you what the problem is exactly, since you’ve not detailed exactly how you installed the 352 and 364 drivers, but this:

“Where I had to force the compile of the samples by creating a softlink from /usr/lib/nvidia-364 to /usr/lib/nvidia-352, this worked. (I include this to be complete.)”

is definitely abnormal, and indicative an improperly set up system.

on ubuntu 14.04, i installed the driver using standard commands:

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-364
sudo reboot

and there are no other complaints with this driver, so this seems okay to me.

when i installed the CUDA SDK, the installed driver was 331. but the SDK requires 352 or better. i tried to install the driver that comes with the SDK, but it refused since i had X11 running. sometime later, i installed v361, as described above, and then went to compile the samples set.

but the -L option it specified was /usr/lib/nvidia-352, and not the /usr/lib/nvidia-361 that was automatically installed with the 361 driver.

so, rather than the driver being the problem, it’s the samples and SDK that are assuming that v352 of the driver (and libraries) are installed. hence, the reason for the soft-link from 352 to 361.

but what’s happening at run-time? does the SDK assume the driver version installed instead of querying the OS somehow to fetch the actual installed version?

what might happen if i uninstalled the whole thing, and then install again now that the driver has been upped to 361? does it recognize the version at the moment of installation and then hard-code that down the line?

r-

I would recommend following a coherent install sequence as covered in the linux installation guide:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#abstract

A r352 driver is compatible with CUDA 7.5. It’s not as simple as that being the problem. It’s likely that you have components of different drivers installed, and conflicting with each other, giving the runtime the indication that the driver is incorrect. You may still have components of r331 laying around.

The installation guide gives suggestions for how to remove old components.

You could also do:

dmesg |grep NVRM

to see if the driver is reporting an issue.

output of dmesg command:

[ 19.499707] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 364.19 Tue Apr 19 14:44:55 PDT 2016
[ 21.333328] NVRM: Your system is not currently configured to drive a VGA console
[ 21.333331] NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
[ 21.333333] NVRM: requires the use of a text-mode VGA console. Use of other console
[ 21.333334] NVRM: drivers including, but not limited to, vesafb, may result in
[ 21.333335] NVRM: corruption and stability problems, and is not supported.

is that an error? or just an fyi to be ignored?

all r331 refs were completely removed by the r364 install.
LD_LIBRARY_PATH specifies /usr/local/cuda-7.5/lib64

before i go scouring the code, what exactly is the CUDA library check for the installed driver?

answering my own question re: NVRM messages:

these messages relate to grub and the console, and are not related to the nvidia driver as such.

so it seems that dmesg indicates there is no issue with my driver.

anything i else i can try?

searching, searching and searching, and i finally found an interesting tip:

when updating the driver using apt-get and friends, one must also install some extras:

nvidia-364
libcuda1-364

(or whatever version-appropriate name versions you require)

sure would be nice that the error message was actually accurate. (what’s so hard about recognizing that a required dependency doesn’t exist? it’s the oldest problem in the book, and we still can’t get it right.)

but then, that wouldn’t go very far to further our reputation as programmers, now would it?

You have a typo there - it’s libcuda1-364. But yeah, it seems that CUDA ships with libcuda for the 352 driver and you need to manually update it if you want to use it with a newer driver…

nVidia really needs to release a new version of CUDA for linux, if not only for 16.04 support. Hell, I’m on 14.04 and a kernel update recently broke 352 for whatever reason, so now I’m left with making softlinks just to compile the samples with a newer driver and whatnot.

thanks, typo fixed…