Tensorflow Error after Confirming CUDA/Cudnn Installation being Successful

Hello All,

I have been browsing through webpages and posts for days, attempted solutions I could find, yet no success.
The system settings:
GPU: Tesla M60; OS: Ubuntu-16-04; Nvidia Driver: 375.26; CUDA-ToolKit: 8.0; Cudnn: tried both 6.0 and 7.2
Tensorflow-GPU: V1.10.0 installed through Anaconda

I have passed deviceQuery and bandwidthTest for CUDA, as well as mnist sample for cudnn. However, when trying to start a tensorflow session, I encountered the following error:
“Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version”.

I have ensured both CUDA driver and runtime version are listed as 8.0 through nvidia-smi command. I have also checked that CUDA 8.0 is compatible with Nvidia driver 375 (in fact, I installed nvidia driver through CUDA’s runfile). Could anyone kindly provide insights to this problem?

Meanwhile, several questions that I couldn’t figure out despite searching through online information:

  1. Is there a restriction on CUDA version for being compatible with Tesla M60 card?
    Tesla M60 is with compute capability 5.3, thus Maxwell structure. Through here: https://docs.nvidia.com/cuda/maxwell-compatibility-guide/index.html, they only mentioned the compatibility upto CUDA 7.0, yet no concrete specification on CUDA versions is made. I believe CUDA 8.0 would be compatible, since I have successfully run tensorflow over a GTX 750-Ti with CUDA 8.0 on Windows OS, while 750-Ti is with compute capability 5.3.

  2. Installation methods for NVidia Driver?
    This is the most confusing part, as 3 methods have been widely mentioned without any comparison:
    a). Install Nvidia driver first, through Nvidia driver download webpage.
    b). Install Nvidia driver first, through sudo apt-get (for Ubuntu).
    c). Install Nvidia driver, yet through CUDA’s runfile.
    I understand b), compared with a), is lagging in terms of updated versions. However, some posts have suggested certain errors would be caused by one but not the other. Essentially, how should we go about and choose from these 3 methods?

  3. Version compability list between Nvidia driver, CUDA, and cudnn?
    It’s interesting that I couldn’t find any definite list on official webpages, as such information is vital. I could only find a list from one forum post indicating the latest nvidia driver version that each CUDA version supports. It would be definitely helpful if such information could be obtained in an organized manner from the official source.

Thanks a lot for anyone who could address to these inquiries.

The tensorflow you are using is built against CUDA 9 and therefore won’t work with your CUDA 8 setup.

Tesla M60 is not compute capability 5.3. It is 5.2.

There are multiple methods to install a GPU driver.
Here is a recent driver for tesla M60 on Ubuntu 16.04:

(you can always find drivers by using the driver wizard at http://www.nvidia.com/drivers )
(you can always find CUDA toolkit installers at http://www.nvidia.com/getcuda )
(and note that the install guides are linked from the download pages, I suggest reading a cuda linux install guide)
https://www.nvidia.com/drivers/results/136950

It will work with CUDA 9 and you should be able to install and use CUDA 9 on Tesla M60 if you wish.

The official source for driver/cuda toolkit compatibility is Table 1 here:

https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#major-components

For CUDNN compatibility, just study the options on the download page for CUDNN. They indicate which CUDA versions they were built for or support.

Hello txbob, thanks for your reply! I am wondering how to obtain the information that tensorflow 1.10.0 is not compatible with CUDA versions before 9.0? Thanks!

https://www.tensorflow.org/install/install_sources#tested_source_configurations

Version:	CPU/GPU:	Python Version:	Compiler:	Build Tools:	cuDNN:	CUDA:
tensorflow-1.10.0	CPU	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0    	N/A	N/A
tensorflow_gpu-1.10.0	GPU	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0    	7	9

To be clear, I didn’t say “tensorflow 1.10.0 is not compatible with CUDA versions before 9.0”, I said “The tensorflow you are using is built against CUDA 9 and therefore won’t work with your CUDA 8 setup.”

The version of TF you are using installed via Anaconda is built against CUDA 9. TF can be built (from sources) against different versions of CUDA, but the prebuilt binaries from trusted sources are usually built according to the table already given above.

Note that tensorflow is not a NVIDIA product.

Hello txbob, thanks again for your prompt reply.

Yet after I installed nvidia driver 396 and CUDA 9.0 and made samples, I encountered the following error in bandwidthTest:
Running on…

Device 0: Tesla M60
Quick Mode

CUDA error at bandwidthTest.cu:730 code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”

I have seen this error before whenever I install the nvidia driver NOT from CUDA runfile (and thus asking what’s the difference between installing drivers from/not from CUDA’s runfile). Also I am wondering why when installing Nvidia driver through CUDA runfile, it prompts to install gcc and make tools, while installing nvidia-driver through official driver download page’s deb file doesn’t require so (does it automatically install both gcc and make?)

Online search of this code 46 error also suggests no solution that resolve my case.

Update: After I restarted from scratch, I installed CUDA 9.0 and NVidia driver with the runfile (384.81) and cudnn V7. I then proceeded as before: anaconda 5.2 and tensorflow-gpu 1.10.0.
Yet the same error occurs again:
Internal: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

Any insights would be highly appreciated!

So as I’ve tried with CUDA9.0 yet still no success, any more insights on potential issue causing the “CUDA driver version is insufficient for CUDA runtime version” error?

Have been stuck here for a while, any support would be appreciated!