Installed CUDA 10.0 on Ubuntu 18.04 with deb file. NVIDIA-SMI "couldn't communicate with NVIDIA driver".

PROBLEM STATEMENT:
Installed CUDA 10.0 from https://developer.nvidia.com/cuda-10.0-download-archive and then following the instructions from https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html. In terminal, this is what happens

>>>nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver.
Make sure that the latest NVIDIA driver is installed and running.

WHY CUDA10.0:
Tensorflow (https://www.tensorflow.org/install/gpu) recommends using that.

WHAT I’VE TRIED:
I’ve actually done ubuntu-drivers auto-install and it installs the nvidia-430 drivers. This gets the nvidia-smi command to work but the problem is it conflicts with what
https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html which is that we shouldn’t install any other nvidia stuff from anywhere else other than the official cuda toolkit.

I’ve reinstalled Ubuntu 18.04 to try again. The question is should I just do ubuntu-drivers autoinstall and break my installation or is there a more proper way to do this?

RELEVANT TERMINAL OUTPUTS:

>>>inxi -Gx
Graphics:  Card: NVIDIA GV104 [GeForce GTX 1180] bus-ID: 01:00.0
           Display Server: x11 (X.Org 1.19.6 )
           driver: (unloaded: modesetting)
           Resolution: 1920x1080@60.00hz, 1920x1080@60.00hz
           OpenGL: renderer: llvmpipe (LLVM 8.0, 256 bits)
           version: 3.3 Mesa 19.0.8 Direct Render: Yes
>>> dpkg -l | grep nvidia
...
ii  nvidia-driver-410                               410.48-0ubuntu1                              amd64        NVIDIA driver metapackage

...

I’m guessing that despite installing CUDA toolkit 10.0, the nvidia-driver-410 is not loaded?

RELEVANT LINKS:
https://devtalk.nvidia.com/default/topic/1062034/cuda-setup-and-installation/can-t-install-anything-but-cuda-10-1-with-driver-version-430/post/5378062/#5378062

nvidia-bug-report.gz (56.8 KB)

  • install the driver from repo/ppa (sudo apt install nvidia-driver-430)
  • download the cuda .deb
  • add the repo to your system (first three steps from install instructions on download page)
  • don’t install cuda
  • instead, run sudo apt install cuda-toolkit-10-0

Hey @generix, so I’ve just done it the “correct” way with inspiration from you!

Basically, it seems that my RTX 2080 doesn’t like the old nvidia-drivers-410 from CUDA 10.0. So what I did is the following steps:

  1. Install CUDA10.1 from https://developer.nvidia.com/cuda-downloads
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda-drivers

This installs JUST the cuda-drivers (which works now when I call nvidia-smi!)
3) Install the other missing packages with https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=debnetwork with

sudo apt install cuda-toolkit-10-0

IMPORTANT: DON’T INSTALL ANY CUDA 10.0 PACKAGES THAT WILL OVERWRITE THE DRIVER WE JUST INSTALL!!!

Exciting stuff! Thanks again!

I had a similar problem to the OP.

When I run ‘sudo apt install nvidia-driver-430’ I get the following:

nvidia-driver-430 is already the newest version (430.26-0ubuntu0.18.04.2).
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 libnvidia-ifr1-430 : Depends: libnvidia-gl-430 but it is not going to be installed
 libnvidia-ifr1-430:i386 : Depends: libnvidia-gl-430:i386 but it is not going to be installed
 nvidia-driver-430 : Depends: libnvidia-gl-430 (= 430.26-0ubuntu0.18.04.2) but it is not going to be installed
                     Recommends: libnvidia-gl-430:i386 (= 430.26-0ubuntu0.18.04.2)
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

If I then run ‘sudo apt --fix-broken install’ as suggested, I get

Preparing to unpack .../libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb ...
diversion of /usr/lib/i386-linux-gnu/libGL.so.1 to /usr/lib/i386-linux-gnu/libGL.so.1.distrib by nvidia-340
dpkg-divert: error: mismatch on package
  when removing 'diversion of /usr/lib/i386-linux-gnu/libGL.so.1 by libnvidia-gl-430'
  found 'diversion of /usr/lib/i386-linux-gnu/libGL.so.1 to /usr/lib/i386-linux-gnu/libGL.so.1.distrib by nvidia-340'
dpkg: error processing archive /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb (--unpack):
 new libnvidia-gl-430:i386 package pre-installation script subprocess returned error exit status 2
Preparing to unpack .../libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb ...
diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 to /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib by nvidia-340
dpkg-divert: error: mismatch on package
  when removing 'diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 by libnvidia-gl-430'
  found 'diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 to /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib by nvidia-340'
dpkg: error processing archive /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb (--unpack):
 new libnvidia-gl-430:amd64 package pre-installation script subprocess returned error exit status 2
Errors were encountered while processing:
 /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb
 /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Suggestions/help very welcome. Thanks.

Purge and reinstall?
sudo apt purge “nvidia-*”
sudo apt autoremove

Running 'sudo apt purge “nvidia-*” gives the same error:

The following packages have unmet dependencies:
 libnvidia-ifr1-430 : Depends: libnvidia-gl-430 but it is not going to be installed
 libnvidia-ifr1-430:i386 : Depends: libnvidia-gl-430:i386 but it is not going to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

Then also try purging those
sudo apt purge “nvidia-" "libnvidia-

@CGG try

sudo apt-get --purge remove "*cublas*" "cuda*"
sudo apt-get --purge remove "*nvidia*"

Thanks. All those suggestions leave me with the same problem of unmet dependencies that aren’t fixed by running ‘sudo apt --fix-broken install’.

So I went back to the installation instructions and tried this:

sudo apt-get install linux-headers-$(uname -r)

That also gave the same unmet dependencies:

sudo apt-get install linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
linux-headers-4.15.0-58-generic is already the newest version (4.15.0-58.64).
linux-headers-4.15.0-58-generic set to manually installed.
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 libnvidia-ifr1-430 : Depends: libnvidia-gl-430 but it is not going to be installed
 libnvidia-ifr1-430:i386 : Depends: libnvidia-gl-430:i386 but it is not going to be installed
 nvidia-driver-430 : Depends: libnvidia-gl-430 (= 430.26-0ubuntu0.18.04.2) but it is not going to be installed
                     Recommends: libnvidia-gl-430:i386 (= 430.26-0ubuntu0.18.04.2)
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

Everything was fine on this machine until I went to a new version of some other software (namd, a molecular dynamics package) that wanted a newer version of CUDA than I had installed. The machine was on 16.04, so the first thing I did was update to 18.04. I’m wondering if that is at the root of the problem; if something went wrong in that update, or if something is left hanging around that shouldn’t be.

Again, any suggestions very welcome!

What’s the output if you install libnvidia-gl-430 manually?
e.g.
sudo apt install nvidia-driver-430 libnvidia-gl-430

Same:

Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-driver-430 is already the newest version (430.26-0ubuntu0.18.04.2).
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 libnvidia-ifr1-430:i386 : Depends: libnvidia-gl-430:i386 but it is not going to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

OTOH, running nvidia-smi yields

nvidia-smi
Sun Sep  1 12:09:07 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   46C    P5    33W / 180W |      0MiB /  8114MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Does this mean everything is actually okay? Is there a way to test this?

Just to follow up, if I again run ‘sudo apt --fix-broken install’ I still get the same errors:

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Correcting dependencies... Done
The following packages were automatically installed and are no longer required:
  lib32gcc1 libatkmm-1.6-1v5 libc6-i386 libcairomm-1.0-1v5 libgtkmm-3.0-1v5 libmodplug1
  libnih-dbus1 libpangomm-1.4-1v5 libsdl1.2debian libtbb2 python-gi
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libnvidia-gl-430 libnvidia-gl-430:i386
The following NEW packages will be installed:
  libnvidia-gl-430 libnvidia-gl-430:i386
0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
3 not fully installed or removed.
Need to get 0 B/50.1 MB of archives.
After this operation, 241 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
(Reading database ... 165739 files and directories currently installed.)
Preparing to unpack .../libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb ...
diversion of /usr/lib/i386-linux-gnu/libGL.so.1 to /usr/lib/i386-linux-gnu/libGL.so.1.distrib by nvidia-340
dpkg-divert: error: mismatch on package
  when removing 'diversion of /usr/lib/i386-linux-gnu/libGL.so.1 by libnvidia-gl-430'
  found 'diversion of /usr/lib/i386-linux-gnu/libGL.so.1 to /usr/lib/i386-linux-gnu/libGL.so.1.distrib by nvidia-340'
dpkg: error processing archive /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb (--unpack):
 new libnvidia-gl-430:i386 package pre-installation script subprocess returned error exit status 2
Preparing to unpack .../libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb ...
diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 to /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib by nvidia-340
dpkg-divert: error: mismatch on package
  when removing 'diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 by libnvidia-gl-430'
  found 'diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 to /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib by nvidia-340'
dpkg: error processing archive /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb (--unpack):
 new libnvidia-gl-430:amd64 package pre-installation script subprocess returned error exit status 2
Errors were encountered while processing:
 /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb
 /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

Forget about the --fix-broken option, never works.
The driver is probably partially installed, the complaint was about the 32bit compat libs, try
sudo apt install nvidia-driver-430 libnvidia-gl-430 libnvidia-gl-430:i386

Same:

sudo apt install nvidia-driver-430 libnvidia-gl-430 libnvidia-gl-430:i386
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-driver-430 is already the newest version (430.26-0ubuntu0.18.04.2).
The following packages were automatically installed and are no longer required:
  lib32gcc1 libatkmm-1.6-1v5 libc6-i386 libcairomm-1.0-1v5 libgtkmm-3.0-1v5 libmodplug1
  libnih-dbus1 libpangomm-1.4-1v5 libsdl1.2debian libtbb2 python-gi
Use 'sudo apt autoremove' to remove them.
The following NEW packages will be installed:
  libnvidia-gl-430 libnvidia-gl-430:i386
0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
3 not fully installed or removed.
Need to get 0 B/50.1 MB of archives.
After this operation, 241 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
(Reading database ... 165739 files and directories currently installed.)
Preparing to unpack .../libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb ...
diversion of /usr/lib/i386-linux-gnu/libGL.so.1 to /usr/lib/i386-linux-gnu/libGL.so.1.distrib by nvidia-340
dpkg-divert: error: mismatch on package
  when removing 'diversion of /usr/lib/i386-linux-gnu/libGL.so.1 by libnvidia-gl-430'
  found 'diversion of /usr/lib/i386-linux-gnu/libGL.so.1 to /usr/lib/i386-linux-gnu/libGL.so.1.distrib by nvidia-340'
dpkg: error processing archive /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb (--unpack):
 new libnvidia-gl-430:i386 package pre-installation script subprocess returned error exit status 2
Preparing to unpack .../libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb ...
diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 to /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib by nvidia-340
dpkg-divert: error: mismatch on package
  when removing 'diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 by libnvidia-gl-430'
  found 'diversion of /usr/lib/x86_64-linux-gnu/libGL.so.1 to /usr/lib/x86_64-linux-gnu/libGL.so.1.distrib by nvidia-340'
dpkg: error processing archive /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb (--unpack):
 new libnvidia-gl-430:amd64 package pre-installation script subprocess returned error exit status 2
Errors were encountered while processing:
 /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb
 /var/cache/apt/archives/libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

There are references to the old nvidia-340 driver, seems you have some old cruft left blocking things. Try this:
https://ubuntuforums.org/showthread.php?t=2388026&page=3&p=13761809#post13761809

Ha! That seems to have worked. Thanks for your help and patience!

sudo apt --fix-broken install
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Correcting dependencies... Done
The following packages were automatically installed and are no longer required:
  lib32gcc1 libatkmm-1.6-1v5 libc6-i386 libcairomm-1.0-1v5 libgtkmm-3.0-1v5 libmodplug1
  libnih-dbus1 libpangomm-1.4-1v5 libsdl1.2debian libtbb2 python-gi
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libnvidia-gl-430 libnvidia-gl-430:i386
The following NEW packages will be installed:
  libnvidia-gl-430 libnvidia-gl-430:i386
0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
3 not fully installed or removed.
Need to get 0 B/50.1 MB of archives.
After this operation, 241 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
(Reading database ... 165716 files and directories currently installed.)
Preparing to unpack .../libnvidia-gl-430_430.26-0ubuntu0.18.04.2_i386.deb ...
Unpacking libnvidia-gl-430:i386 (430.26-0ubuntu0.18.04.2) ...
Preparing to unpack .../libnvidia-gl-430_430.26-0ubuntu0.18.04.2_amd64.deb ...
Unpacking libnvidia-gl-430:amd64 (430.26-0ubuntu0.18.04.2) ...
Setting up libnvidia-gl-430:i386 (430.26-0ubuntu0.18.04.2) ...
Setting up libnvidia-gl-430:amd64 (430.26-0ubuntu0.18.04.2) ...
Setting up libnvidia-ifr1-430:amd64 (430.26-0ubuntu0.18.04.2) ...
Setting up libnvidia-ifr1-430:i386 (430.26-0ubuntu0.18.04.2) ...
Setting up nvidia-driver-430 (430.26-0ubuntu0.18.04.2) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...

Hi

I am using jtop to check CPU, GPU usage and all related data: