Cuda 6.5 deviceQuery fails even though Nvidia driver for Tesla is properly installed

We have an HP SL250s Gen8 with two Tesla M2090 GPU blades installed. This is running RHEL6.5 but the site that installed the base OS updated their kernel via RHN to the following:

2.6.32-431.23.3.el6.x86_64

The driver was installed after the kernel update and is the latest available Tesla driver - just downloaded today.

Cuda installed via “yum” repo method using the cuda-repo-rhel6-6.5-14.x86_64.rpm and no errors noted at installation time.

When we installed the samples and ran “make” we then tried our usual deviceQuery but only to get this output:

[root@mn318 release]# ./deviceQuery
./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

FATAL: Could not open ‘/lib/modules/2.6.32-431.23.3.el6.x86_64/kernel/drivers/video/nvidia-uvm.ko’: No such file or directory
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

In the aforementioned directory, only “nvidia” is present.

Any ideas on this one? Many thanks in advance!

Your driver install is broken. It’s not clear which GPU driver you installed. " is the latest available Tesla driver - just downloaded today" doesn’t tell me anything. I don’t know where you downloaded it from or which driver it was. Also, last time I checked, there are no Tesla standalone drivers available yet which support CUDA 6.5 - you have to use the one that comes with the CUDA 6.5 runfile installer (340.29). (I tried using the driver wizard just now on nvidia.com to select the “latest” 64-bit linux driver for M2090, and I got 331.89, which is not acceptable for CUDA 6.5). I suggest downloading the 64-bit linux CUDA 6.5 runfile installer and using that.

Note the first question in the FAQ here:

https://developer.nvidia.com/cuda-downloads

"Q: Are the latest NVIDIA drivers included in the CUDA Toolkit installers?
A: For convenience, the installer packages on this page include NVIDIA drivers which support application development for all CUDA-capable GPUs supported by this release of the CUDA Toolkit. If you are deploying applications on NVIDIA Tesla products in a server or cluster environment, please use the latest recommended Tesla driver that has been qualified for use with this version of the CUDA Toolkit. If a recommended Tesla driver is not yet available, please check back in a few weeks. "

So use this runfile installer:

http://developer.download.nvidia.com/compute/cuda/6_5/rel/installers/cuda_6.5.14_linux_64.run

and it’s included driver, until a proper (340.xx or higher) driver is separately available for Tesla products.

Indeed, the older driver you just mentioned is the one I’m trying to run. I will try installing using the *.run file and include driver as well.

The procedure I used was to first remove cuda the way I installed it, using yum remove cuda. From there, I tried doing the *.run script, but there were some complaints about the earlier driver. The log file included a way to remove the driver components using yum remove commands.

Once those were run, I then re-ran the cuda installation script and included the driver.

nvidia-smi -p reports that I am now using 340.29.

I can’t tell if you’re still having trouble.

Is deviceQuery working now?

CUDA 6.5 requires a proper install of a 340.xx or newer driver.

Sorry - late reply due to U.S. holiday:

YES, and thank you very much. This was indeed the one and only problem: the ‘latest driver’ available via the download wizard wasn’t as new as the driver embedded in the cuda*.run installation file!

A surprising problem, really, and unexpected. I would think that the latest drivers need to also make it into the standalone download wizard without the need to install Cuda.

However, everything is now working as expected and the samples built and running well.