I have to say that I thought that going back to previous linux kernel sorted out my problems. In fact, nvidia-smi was working but then it was not possible for CUDA to work anyway–it had to be rebuild for the kernel again, and then the problem reappeared.
Adopting the method you suggested (i.e., using sudo add-apt-repository ppa:graphics-drivers) worked for me. That is, I had to install the driver (387) separately and then CUDA 9.1 after extracting it from the three components (driver, cuda-toolkit and samples).
It could be, but I checked the kernel on both machines and the names/numbers seems the same. It must be to do with the hardware thought.
Furthermore, I need to clarify too that the newer Dell machine uses TitanXp with an older Quadro. The older Dell machines use K20/K40 and even older NVIDIA cards just for basic display. That may be a factor as well.
I second this! I have a gtx1080ti, and I can confirm v387 won’t work properly. nvidia-smi gave the same error in OP’s post, and can not login with gnome in ubuntu 16.04(The resolution of display is very low. After you fill your password and hit enter, the screen stays in the login page). Login thru mate desktop interface works fine, but resolution is also very low. At this point, you can open additional drivers in system setting, and switch to proprietary driver to 384.111 and hit apply. After you install v384, reboot your machine, you are good to go.
I use Google Compute Engine and kernel downgrading did not work for me. So I deleted instance and chose a virtual machine with Ubuntu 14.04 - all drivers work good there.
Maybe somebody can find it helpful.
three times happened to me!
my system environment: Ubuntu 16+NVIDIA Driver 384.90 describe: I am sure it could work before, but after some days(maybe 30 days or more), run command “nvidia-smi”, it reminds me:“NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.” reason:Ubuntu 16 update its kernel automatically! you can check the grub log file, or run command “cat /etc/apt/apt.conf.d/10periodic”, you can see the last line:“Unattended-upgrade “1” ”
when the kernel updated, the nvidia driver couldnt work properly. solution:downgrade the kernel, or select the lower version kernel, or delete the latest version kernel, or set “Unattended-upgrade” as 0, or reinstall the Nvidia driver .
Have the same bug on ubuntu (both 16.04 and 14.04) - with kernel: 4.13.0 on ubuntu 16.04. Do you know which Ubuntu Kernel should I go back to to be able to work?
error msg (trivial part):
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Thanks
Thanks Michael. I did install nvidia-390 and all is well. Unbelievable… Also I tried on Ubuntu 14.04 which runs on 4.4 kernel and got the same error. So maybe it’s not the kernel…
Finally, Michael’s proposal works well on both local machine and GCP (Google Cloud Platform) - I tested both. So if you in a hurry I think it can save your day.
George
Before installing CUDA 9.1, ensure that you have the latest NVIDIA drive R390 installed. The latest NVIDIA R390 driver is available at: Official Drivers | NVIDIA
The CUDA network repositories have also been updated with the latest R390 driver packages. For more information about installing driver and CUDA from the network repository, see the Linux Installation Guide at: Installation Guide Linux :: CUDA Toolkit Documentation
As a note to others dealings with this, when you install CUDA, DO NOT install it using the deb file. Nvidia needs to fix this, but installing using the deb automatically installs the old, problematic 387 driver, even if you already installed 390 and have it working. The solution is to use the runfile. The runfile asks if you want to install a driver. Say no and everything will be fine.
I struggled with this for longer than I want to admit. Frustrating to realize that I wasn’t doing anything wrong. The installation files from Nvidia have the wrong drivers.
kmbutler - did you install the NVIDIA driver using the .run installer? What you’re describing typically happens when you mix the .run installation with the deb/rpm package install using a Linux package manager. The Linux package manager does not detect that a newer driver is installed.
I would recommend installing both the NVIDIA driver and the CUDA toolkit using the package manager to avoid your system being in this state. To install the NVIDIA driver using our CUDA network repo, follow the steps under the “Additional Information” tab on the driver download page:
Doing so, will install the R390 driver correctly from our network repo. Then, when you attempt to install the CUDA toolkit using the local deb/rpm installer (or the network installer), the package manager will correctly detect that you have a newer driver and prevent installation of the older R387 driver.
pramarao - I have everything working and was posting for folks who are still struggling as the documentation is poor. To further complicate things, .run is the only driver install option for my card (GTX 1080) on the Nvidia site (Linux x64 (AMD64/EM64T) Display Driver | 390.25 | Linux 64-bit | NVIDIA). I had tried to install using the package manager and a PPA but I couldn’t find R390 on one, so .run was the only option. So it still seems like the only way to get 9.1 working on Ubuntu (at least if you have a GTX 1080) is with a .run file for the driver and a .run file for CUDA.
I haven’t tried it, but would the directions that are under “additional info” for the link you provided work for the GTX 1080?