"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" Ubuntu 16.04

Thanks Michael for this.

I have to say that I thought that going back to previous linux kernel sorted out my problems. In fact, nvidia-smi was working but then it was not possible for CUDA to work anyway–it had to be rebuild for the kernel again, and then the problem reappeared.

Adopting the method you suggested (i.e., using sudo add-apt-repository ppa:graphics-drivers) worked for me. That is, I had to install the driver (387) separately and then CUDA 9.1 after extracting it from the three components (driver, cuda-toolkit and samples).

It could be, but I checked the kernel on both machines and the names/numbers seems the same. It must be to do with the hardware thought.

Furthermore, I need to clarify too that the newer Dell machine uses TitanXp with an older Quadro. The older Dell machines use K20/K40 and even older NVIDIA cards just for basic display. That may be a factor as well.

the nvidia driver setup fixed it.
thanks michael!

I second this! I have a gtx1080ti, and I can confirm v387 won’t work properly. nvidia-smi gave the same error in OP’s post, and can not login with gnome in ubuntu 16.04(The resolution of display is very low. After you fill your password and hit enter, the screen stays in the login page). Login thru mate desktop interface works fine, but resolution is also very low. At this point, you can open additional drivers in system setting, and switch to proprietary driver to 384.111 and hit apply. After you install v384, reboot your machine, you are good to go.

I also faced the similar situation. The easiest fix is to reinstall the driver which might have got updated. I reinstalled the driver 384 and nvidia-smi worked like charm. http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux

Same here. Been using GPUs on google cloud for a month, start up my VM today and the nvidia-SMI no longer works.

By swiching the linux kernel to older one, nvidia-387 works and so does CUDA.

does not work on latest kernel 4.13.0.26-generic
work on 4.10.0-28-generic

It might be caused by Linux CPU security update for Meltdown and Spectre.

I use Google Compute Engine and kernel downgrading did not work for me. So I deleted instance and chose a virtual machine with Ubuntu 14.04 - all drivers work good there.
Maybe somebody can find it helpful.

+1, same issue on google cloud.
Should we expect driver fixes ?
Thanks

three times happened to me!
my system environment: Ubuntu 16+NVIDIA Driver 384.90
describe: I am sure it could work before, but after some days(maybe 30 days or more), run command “nvidia-smi”, it reminds me:“NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”
reason:Ubuntu 16 update its kernel automatically! you can check the grub log file, or run command “cat /etc/apt/apt.conf.d/10periodic”, you can see the last line:“Unattended-upgrade “1” ”
when the kernel updated, the nvidia driver couldnt work properly.
solution:downgrade the kernel, or select the lower version kernel, or delete the latest version kernel, or set “Unattended-upgrade” as 0, or reinstall the Nvidia driver .

Have the same bug on ubuntu (both 16.04 and 14.04) - with kernel: 4.13.0 on ubuntu 16.04. Do you know which Ubuntu Kernel should I go back to to be able to work?

error msg (trivial part):

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Thanks

George

Thanks Michael. I did install nvidia-390 and all is well. Unbelievable… Also I tried on Ubuntu 14.04 which runs on 4.4 kernel and got the same error. So maybe it’s not the kernel…

Finally, Michael’s proposal works well on both local machine and GCP (Google Cloud Platform) - I tested both. So if you in a hurry I think it can save your day.
George

confirmed that in GCP, Michael’s instruction fixed the issue.
Here is the commands I used:

sudo apt-get purge nvidia*
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update
sudo apt-get install nvidia-390

sudo reboot

…after reboot:
lsmod | grep nvidia

Hi all,

Before installing CUDA 9.1, ensure that you have the latest NVIDIA drive R390 installed. The latest NVIDIA R390 driver is available at: www.nvidia.com/drivers

The CUDA network repositories have also been updated with the latest R390 driver packages. For more information about installing driver and CUDA from the network repository, see the Linux Installation Guide at: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

Thanks,
Siddharth

Indeed that worked for me. Even though you see cuda uninstalling it.

I solve this issue, simply by reinstall the cuda.

As a note to others dealings with this, when you install CUDA, DO NOT install it using the deb file. Nvidia needs to fix this, but installing using the deb automatically installs the old, problematic 387 driver, even if you already installed 390 and have it working. The solution is to use the runfile. The runfile asks if you want to install a driver. Say no and everything will be fine.

I struggled with this for longer than I want to admit. Frustrating to realize that I wasn’t doing anything wrong. The installation files from Nvidia have the wrong drivers.

I believe the network deb installer should pick up a newer driver.

kmbutler - did you install the NVIDIA driver using the .run installer? What you’re describing typically happens when you mix the .run installation with the deb/rpm package install using a Linux package manager. The Linux package manager does not detect that a newer driver is installed.

I would recommend installing both the NVIDIA driver and the CUDA toolkit using the package manager to avoid your system being in this state. To install the NVIDIA driver using our CUDA network repo, follow the steps under the “Additional Information” tab on the driver download page:

http://www.nvidia.com/download/driverResults.aspx/130188/en-us

Doing so, will install the R390 driver correctly from our network repo. Then, when you attempt to install the CUDA toolkit using the local deb/rpm installer (or the network installer), the package manager will correctly detect that you have a newer driver and prevent installation of the older R387 driver.

Hope this helps.

pramarao - I have everything working and was posting for folks who are still struggling as the documentation is poor. To further complicate things, .run is the only driver install option for my card (GTX 1080) on the Nvidia site (http://www.nvidia.com/download/driverResults.aspx/130646/en-us). I had tried to install using the package manager and a PPA but I couldn’t find R390 on one, so .run was the only option. So it still seems like the only way to get 9.1 working on Ubuntu (at least if you have a GTX 1080) is with a .run file for the driver and a .run file for CUDA.

I haven’t tried it, but would the directions that are under “additional info” for the link you provided work for the GTX 1080?