I have updated the drivers for my GTX 970 under Ubuntu 16.04 from version 367.57 to version 375.26. Now when I run nvidia-smi I get:
Failed to initialize NVML: Driver/library version mismatch
And programs using Tensorflow don’t find the GPU anymore, I get these errors:
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: Ono-Sendai
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: Ono-Sendai
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 375.26.0
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:363] driver version file contents: “”“NVRM version: NVIDIA UNIX x86_64 Kernel Module 367.57 Mon Oct 3 20:37:01 PDT 2016
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
“””
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 367.57.0
E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:303] kernel version 367.57.0 does not match DSO version 375.26.0 – cannot find working devices in this configuration
After I reverted to version 375.26 of the drivers, the errors went away, and nvidia-smi and Tensorflow work fine. So what should I do to use the most recent drivers 375.26? Perhaps I need to install a more recent version of the CUDA toolkit and/or cuDNN?
Install them correctly. The correct install method depends on how the previous driver was installed. You’ll get a sense of this if you read the linux install guide:
mmm it seems the container must be rebuilt on driver change which makes sense.
Full procedure was:
install cuda (which automatically installs unwanted newest drivers)
remove new driver with apt-get remove nvidia-driverxxxx
install old driver from runfile
rebuild container