"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" Ubuntu 16.04

Hi,

I’ve installed CUDA 8.0 through the runfile for Ubuntu 16.04 but I can’t get my code that works on my other machine to run. When I try nvidia-smi I get: “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

My GPUs, as told by lspci | grep -i nvidia are:
0b:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)
13:00.0 VGA compatible controller: NVIDIA Corporation GK104GL [GRID K2] (rev a1)

dpkg -l | grep nvidia gives:

rc  nvidia-304                                      304.134-0ubuntu0.16.04.1                      amd64        NVIDIA legacy binary driver - version 304.134
ii  nvidia-367                                      375.39-0ubuntu0.16.04.1                       amd64        Transitional package for nvidia-375
ii  nvidia-375                                      375.39-0ubuntu0.16.04.1                       amd64        NVIDIA binary driver - version 375.39
ii  nvidia-common                                   1:0.4.17.2                                    amd64        transitional package for ubuntu-drivers-common
rc  nvidia-cuda-toolkit                             7.5.18-0ubuntu1                               amd64        NVIDIA CUDA development toolkit
rc  nvidia-opencl-icd-304                           304.134-0ubuntu0.16.04.1                      amd64        NVIDIA OpenCL ICD
ii  nvidia-opencl-icd-375                           375.39-0ubuntu0.16.04.1                       amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                                    0.8.2                                         amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                                 378.13-0ubuntu0~gpu16.10.2                    amd64        Tool for configuring the NVIDIA graphics driver

I’ve tried the deb file install, adding the PPA repo and using that, rebooted my machine, but nothing seems to work. Can someone help me?

You seem to have a mix of driver components from several different drivers. This is a broken config.

Follow the steps in the cuda linux install guide to remove every last scrap of NVIDIA software from your machine:

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#handle-uninstallation

then stop, and read the above linked guide in its entirety.

Then pick either the runfile install method, or the package manager install method, and follow the instructions.

Hi halt9, did you get solution?

I wasn’t able to resolve the issue, but I was able to determine the cause. I’m using a passthrough VM, and the GPU I use apparently isn’t supported with Ubuntu under passthrough VMs.

I am facing the same issue. I did a deb installation with cuda-repo-ubuntu1604-9-1-local_9.1.85-1_amd64.deb. I have done the following:
sudo apt-get install cuda
sudo apt-get install cuda-drivers.
Ref: https://developer.download.nvidia.com/compute/cuda/9.1/Prod/docs/sidebar/CUDA_Installation_Guide_Linux.pdf

I have set the PATH. But nvidia-smi gives error :“NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

lspci | grep -i nvidia

01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)

dpkg -l | grep nvidia

ii  nvidia-387                                  387.26-0ubuntu1                              amd64        NVIDIA binary driver - version 387.26
ii  nvidia-387-dev                              387.26-0ubuntu1                              amd64        NVIDIA binary Xorg driver development files
ii  nvidia-modprobe                             387.26-0ubuntu1                              amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-opencl-icd-387                       387.26-0ubuntu1                              amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                                0.8.2                                        amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                             387.26-0ubuntu1                              amd64        Tool for configuring the NVIDIA graphics driver

It should only be necessary to do this:

sudo apt-get install cuda

It should not be necessary to also do this:

sudo apt-get install cuda-drivers

you might simply need to reboot

Same here, today update to last cuda (9.1.85-1) with driver 387.26-1 and don’t work

Kernel:
Linux pop-01 4.13.0-26-generic #29~16.04.2-Ubuntu SMP Tue Jan 9 22:00:44 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
i test in a clean instalation and don’t work. reboot, etc and only have “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

$ sudo apt-get install cuda
Reading package lists… Done
Building dependency tree
Reading state information… Done
cuda is already the newest version (9.1.85-1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

$ sudo apt-get install cuda-drivers
Reading package lists… Done
Building dependency tree
Reading state information… Done
cuda-drivers is already the newest version (387.26-1).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

in a new system or old one the error is the same

$ dpkg -l | grep nvidia
ii nvidia-387 387.26-0ubuntu1 amd64 NVIDIA binary driver - version 387.26
ii nvidia-387-dev 387.26-0ubuntu1 amd64 NVIDIA binary Xorg driver development files
ii nvidia-modprobe 387.26-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-opencl-icd-387 387.26-0ubuntu1 amd64 NVIDIA OpenCL ICD
ii nvidia-prime 0.8.2 amd64 Tools to enable NVIDIA’s Prime
ii nvidia-settings 387.26-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver

$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
03:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
05:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
05:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
06:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
06:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
07:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
07:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
08:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
08:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
0a:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
0a:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)

I did an update on my Ubuntu 16.04 test machine today and got the same messages. I wiped the partitions and did a fresh install with Ubuntu 17.10. I went to software updates and changded the graphics driver to nvidia 384.11. No other drivers or packages installed. Opened a terminal window:

lspci | grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation GP106 [GeForce GTX 1060 3GB] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP106 High Definition Audio Controller (rev a1)

dpkg -l | grep nvidia

ii  nvidia-384                                 384.111-0ubuntu0.17.10.1                    amd64        NVIDIA binary driver - version 384.111
ii  nvidia-opencl-icd-384                      384.111-0ubuntu0.17.10.1                    amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                               0.8.5                                       amd64        Tools to enable NVIDIA's Prime
ii  nvidia-settings                            384.69-0ubuntu1                             amd64        Tool for configuring the NVIDIA graphics driver

nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Fresh install. The GUI appears to work, but I need to program in cuda. Not sure if going back to 16.4 will help. Have no idea what to try next. I am very interested in a solution.

I discover one workaround to this problem: use an old Kernel version.

With my Ubuntu 16.04.2 i make and update and see a cuda and Kernel update.

When i reboot the machine i see the “NVIDIA-SMI has failed because it couldn’t…” message, I try in a new installation, old one and see the main diference was the Kernel version.

On Kernel 4.13.0-26 all the NVIDIA don’t recognize the Cards. In my case 8 x 1070 (for mining purposes). When i use a previous version (4.10.0-42) and re-install cuda 9.1.85-1, the machine work as usual.

I think the Cuda driver has a problem with the new Kernel.

Reading the Ubuntu Kernel Change-log (http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_4.13.0-25.29/changelog) i found some work on the PTI (Page Table Isolation) derivate from the Intel CPU security holes.

Maybe the new patches for avoid the Meltdown and Spectre attacks produce problems with the Nvidia drivers and utilities.

Hi popf6t3s can you provide some steps to downgrade the kernel version. I have hit the same problem on my google cloud instance. Appreciate a response

For downgrade the kernel check your /boot/grub/grub.cfg file and see if have an old version installed, but not active.

In case you don’t have one, you can install the version with your package manager utility (apt-get, yum, dnf, etc). see:

https://askubuntu.com/questions/700214/how-do-i-install-an-old-kernel

if you found an old one, only mark the versión to use editing /etc/defaults/grub, putting something like this:

GRUB_DEFAULT=“1>3”

This value change depending on your installation and updates. Check https://help.ubuntu.com/community/Grub2, In my case grub.cfg has a submenu in the second part and the kernel to load was the 4 section. Remeber: In grub the sections start on zero (0) that’s way i put the second section like “1” and the 4 section like “3”: “1>3”.

You could use an utility called “grub Customizer”.

Be aware: If you don’t make this kind of changes carefully, you can lost access to your server.

Guys …I am using Google Cloud instance which is being charged at $2.5 / hour…cannot have entire set up done again on different instance…Any update on the issue ?

Hi, I have a relatively simple configuration with only one Tesla K80 GPU device attached to my VM on Google Cloud Platform. I use the below mentioned script to install CUDA drivers using root privileges on Ubuntu 16.04.

#!/bin/bash
echo “Checking for CUDA and installing.”

Check for CUDA and try to install.

if ! dpkg-query -W cuda; then

The 16.04 installer works with 16.10.

curl -O http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
dpkg -i ./cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
apt-get update
apt-get install cuda -y
fi

The install is successfully completed. However on running nvidia-smi I receive the same error message

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Do let me know if I am missing out on anything. Thanks !

I just ran into the same problem on one of my Dell machines with K20/40 with the same message:
‘NVIDIA-SMI has failed because it…’

However as mentioned above by ElPop, rebooting with the previous version of Ubuntu kernel (4.10…) sorted out the problem for me.

The funny thing is, only the newer Dell machine got effected. A 6-year-old Dell machine seemed fine all the time O_O

I was facing the same issue as well on Google Cloud Platform. I have been running with no issues for the last month. Now today I run nvidia-smi and get the same error message “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

I originally installed using the same process as vibhor6wvke

I was able to solve using the link from ElPop by rebooting to a previous version of the kernel.
Additionally, this link was helpful for booting from a new kernel on a virtual machine.

https://statusq.org/archives/2012/10/24/4584/

I am also facing the problem above. When I installed cuda, nvidia-387 is installed automatically. But nvidia-387 couldn’t communicate with my gtx 1080.

Instead, nvidia-384 manually installed work with nvidia-smi.

I had the same issue…
i followed below steps and it is working fine.

remove already installed cuda completely

download cuda8.0 deb (local) from here “https://developer.nvidia.com/cuda-80-ga2-download-archive

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb

sudo apt-get update

sudo apt-get install cuda

then update the .bashrc with path

PATH=$PATH:/usr/local/cuda-8.0/bin
LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Your 6-year-old Dell may be running a different kernel. It looks like the Ubuntu 16.04 HWE kernel was recently upgraded from 4.10 to 4.13, but the Ubuntu 16.04 regular kernel is still at 4.4. So, if your new Dell is running HWE and your old one isn’t, that would explain it.

i’m having the same problem on google cloud with ubuntu 16.04 and tesla k80.
the fix that was proposed by ElPop does not work for me because even after downloading and installing different kernel and changing GRUB config according to the instructions he gave, the system reboots to the same kernel.
any one have any idea why?

/etc/default/grub :

GRUB_DEFAULT=saved
GRUB_HIDDEN_TIMEOUT=0
GRUB_HIDDEN_TIMEOUT_QUIET=true
GRUB_TIMEOUT=2
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="console=tty1 console=ttyS0"
GRUB_CMDLINE_LINUX=""

command line:

sudo vim /etc/default/grub
sudo grub-set-default "GNU/Linux, with Linux 4.10.0-041000-generic"
sudo grub-reboot "GNU/Linux, with Linux 4.10.0-041000-generic"
sudo update-grub
sudo reboot

@omer.stein1 for me it didn’t work as well and then I ended up re-creating the whole Google Compute Engine. But after setting it all up, it still didn’t work. So for me the problem was the driver of the P100 GPU on Google Cloud.

After installing the latest driver (nvidia-390) following this guide, it finally works again:

Also this might be helpful, if the driver update doesn’t fix it for you: