"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" Ubuntu 16.04

kmbutler, I also have a GTX 1080 on Ubuntu 16.04. I used the network install .deb file from CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer to install the Nvidia repo and everything is working fine for me (the package manager downloaded the 390 driver).

DerekJuba- Good to know. I’ll exclusively use the network install .deb in the future.

pramarao - As of today (21 Feb 2018) the stand alone .deb file still has issues as it updates the R390 to R387 on my GT520.

GT520 is a fermi device, so you are going to discover that it is not supported by CUDA 9.

In any event, if you use the network .deb installer, it should not replace a r390 driver with a r387

pramarao:
Currently the link provided to get a network install : NVIDIA Driver Downloads

Just shows a “System error”

Can’t find a deb only .run files…
And attempting to use the .run asks that I uninstall all prior Nvidia…

@dartdog: Did you follow the instructions listed at the URL on the ‘Additional Information’ tab? After you add the repository keys and the repository into your package manager lists, then you would use the “cuda-drivers” meta-package to install the NVIDIA driver using our network repo.

Hope that helps.

This shows at the link… Did you try the link yourself?
I would insert the screen shot but it does not look possible?

System Message

Please try again at a later time. Sorry for the inconvenience.

you may find .deb here, most likely https://developer.nvidia.com/cuda-toolkit
otherwise you may add

sudo add-apt-repository ppa:graphics-drivers/ppa

Reference:
https://askubuntu.com/questions/967332/how-can-i-install-cuda-9-on-ubuntu-17-10

NVIDIA does not maintain the graphics-drivers ppa. I suggested the steps listed in the “Additional Information” tab on the page I linked above. For reference, here are the steps:

Update the CUDA network repo keys using the following command

# sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub

Add the CUDA network repo and update the package lists on your system to get new versions of the software and their dependencies.

# sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" > etc/apt/sources.list.d/cuda.list'
# sudo apt-get update

If you already have CUDA installed on your instance and only need to update the NVIDIA driver, install the cuda-drivers meta-package. Then reboot to complete the installation of the 390.30 NVIDIA driver.

# sudo apt-get -y install cuda-drivers
# sudo reboot

If you also need to install the CUDA toolkit, then install the cuda-toolkit-9-1 meta-package to download and install CUDA 9.1.

# sudo apt-get -y install cuda-toolkit-9-1

Hope this helps.

Thank you that seems to have gotten thing back on track.
However as a caution for those coming here to upgrade for TF 1.6, TF 1.6 requires cudnn 9.0 vs 9.1 which this gets you and earlier versions of TF won’t run with this either…

sorry I should have specified Cuda 9.1 vs cudnn

cuDNN 7.1 goes with Cuda 9.1 typically
What if to build tensorflow 1.6 from sources:
https://github.com/tensorflow/tensorflow/releases ?

FWIW I just added this to my install>> sudo apt-get -y install cuda-toolkit-9-0 <<< and TF1.6 is up and running, presumably 9-1 is still there so when TF upgrades its’ dependency I should be good.

I’ve getting the same error!!! Every now and then my computer gets in the dreaded “login loop” because of this.

Problem:
$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Any idea how to fix it?
I’ve made a question in ask ubuntu, you can reply there also:
https://askubuntu.com/questions/1015666/login-loop-ubuntu-16-04-nvidia

Possibly a bad idea. You won’t find any instructions anywhere from NVIDIA that suggest that you should install a driver for a GPU for CUDA use, from the ppa repository. I think I’ve already indicated this in this thread.

Ok, thanks for your comment.
Can you propose potential solutions to the problem or point to the concrete answer where they are?

You may remove all cuda/drivers and try again.
For reference:
[url]Installation Guide Linux :: CUDA Toolkit Documentation
[url]https://devtalk.nvidia.com/default/topic/1000340/cuda-setup-and-installation/-quot-nvidia-smi-has-failed-because-it-couldn-t-communicate-with-the-nvidia-driver-quot-ubuntu-16-04/post/5110945/#5110945[/url]

get your installers from here: [url]http://www.nvidia.com/getcuda[/url]
read the linux install guide: [url]Installation Guide Linux :: CUDA Toolkit Documentation

My recommendation at this time is, on a fresh OS install, to use the CUDA 9.1 network deb install method. Don’t use a runfile installer, don’t use a local deb installer.

Plenty of threads with this recommendation if you care to look around, such as this one:

[url]https://devtalk.nvidia.com/default/topic/1031158/cuda-setup-and-installation/crash-in-driver-build-during-9-1-install-on-ubuntu/[/url]

Ok. Based on andrey suggestions I did the following:

$ sudo apt-get purge nvidia-*
$ sudo apt autoremove
$ sudo apt-get install cuda
$ sudo reboot

Still the same result persists after:

$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Any ideas?

Following txbob suggestions I still don’t get a solution.

Here is what I did:

$ sudo apt-get install linux-headers-$(uname -r)
$ sudo apt-get --purge remove nvidia-*
$ sudo apt autoremove

this is the driver for GTX 1080 from: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=debnetwork

$ sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda
$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

these are the packages I have installed.

$ dpkg -l | grep -i nvidia
ii bbswitch-dkms 0.8-3ubuntu1 amd64 Interface for toggling the power on NVIDIA Optimus video cards
ii cuda-nvtx-9-1 9.1.85-1 amd64 NVIDIA Tools Extension
ii libcuda1-390 390.30-0ubuntu1 amd64 NVIDIA CUDA runtime library
ii libcupti-dev:amd64 7.5.18-0ubuntu1 amd64 NVIDIA CUDA Profiler Tools Interface development files
ii libcupti-doc 7.5.18-0ubuntu1 all NVIDIA CUDA Profiler Tools Interface documentation
ii libcupti7.5:amd64 7.5.18-0ubuntu1 amd64 NVIDIA CUDA Profiler Tools Interface runtime library
ii nvidia-390 390.30-0ubuntu1 amd64 NVIDIA binary driver - version 390.30
ii nvidia-390-dev 390.30-0ubuntu1 amd64 NVIDIA binary Xorg driver development files
ii nvidia-modprobe 390.30-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-opencl-icd-390 390.30-0ubuntu1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA’s Prime
ii nvidia-settings 390.30-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver

I am just wondering what is the return of

lspci

and if it returns any nvidia devices listed.