NVidia driver not loading after CUDA 9.1 installation with runfile

Hello everyone,

I am currently reinstalling my Ubuntu 16.04. So I have a fresh install of Ubuntu. Here is what I have executed since:

sudo apt update && sudo apt upgrade
nb_line=`lspci | grep -i nvidia`
if [ ${nb_line} -eq 0 ]; then
    echo "ERROR: no NVidia device"
    exit 1
fi
gcc --version 2> /dev/null
if [ $? -ne 0 ]; then
    echo "No gcc installed"
    exit 2
fi

# install third party libraries and kernel
sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev \
    libxi-dev libglu1-mesa libglu1-mesa-dev linux-headers-$(uname -r) dkms

# download the installer files
website='https://developer.nvidia.com/'
# uri='compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run'
uri='compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux'
wget "${website}${uri}" -O cuda.run
chmod u+x cuda.run

# blacklist nouveau module
echo "blacklist nouveau
options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf
sudo update-initramfs -u

# reboot in text mode
sudo systemctl set-default multi-user.target
sudo reboot

# install 
sudo ./cuda.run --verbose --silent --driver --toolkit --samples --no-opengl-libs > logs
echo 'export PATH=/usr/local/cuda/bin:${PATH}' >> ~/.bashrc
echo '/usr/local/cuda/lib64' | sudo tee -a /etc/ld.so.conf.d/cuda.conf
sudo ldconfig
# reboot in graphical mode
sudo systemctl set-default graphical.target
sudo reboot

Here the kernel is 4.13.0-26-generic.

On the reboot after the installation, the NVidia drivers are not loaded (there is no /proc/driver/nvidia).

But the script was successfull as says the logs:

Installing the NVIDIA display driver...
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x
Welcome to the NVIDIA Software Installer for Unix/Linux
Detected 4 CPUs online; setting concurrency level to 4.
License accepted by command line option.
Installing NVIDIA driver version 387.26.
Running distribution scripts
  Executing /usr/lib/nvidia/pre-install: [##############################] 100%
WARNING: Unable to find a suitable destination to install 32-bit
         compatibility libraries. Your system may not be set up for 32-bit
         compatibility. 32-bit compatibility files will not be installed;
         if you wish to install them, re-run the installation and set a
         valid directory with the --compat32-libdir option.
The distribution-provided pre-install script failed!  Are you sure you want
to continue? (Answer: Continue installation)
For some distributions, Nouveau can be disabled by adding a file in the
modprobe configuration directory.  Would you like nvidia-installer to
attempt to create this modprobe file for you? (Answer: Yes)
One or more modprobe configuration files to disable Nouveau have been
written.  For some distributions, this may be sufficient to disable
Nouveau; other distributions may require modification of the initial
ramdisk.  Please reboot your system and attempt NVIDIA driver installation
again.  Note if you later wish to reenable Nouveau, you will need to delete
these files: /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
Would you like to register the kernel module sources with DKMS? This will
allow DKMS to automatically build a new module, if you install a different
kernel later. (Answer: Yes)
Installing both new and classic TLS OpenGL libraries.
Installing classic TLS 32bit OpenGL libraries.
Searching for conflicting files:
  Searching: [##############################] 100%
Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (387.26):
  Installing: [ ]
Driver file installation is complete.
Installing DKMS kernel module:
  Adding to DKMS: [##                            ]   5%
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 387.26 -k
       4.13.0-26-generic`: 
       Kernel preparation unnecessary for this kernel.  Skipping...
       
       Building module:
       cleaning build area....
       'make' -j4 NV_EXCLUDE_BUILD_MODULES=''
       KERNEL_UNAME=4.13.0-26-generic modules.......(bad exit status: 2)
       ERROR (dkms apport): binary package for nvidia: 387.26 not found
       Error! Bad return status for module build on kernel:
       4.13.0-26-generic (x86_64)
       Consult /var/lib/dkms/nvidia/387.26/build/make.log for more
       information.
  Adding to DKMS: [##############################] 100%
ERROR: Failed to install the kernel module through DKMS. No kernel module
       was installed; please try installing again without DKMS, or check
       the DKMS logs for more information.
ERROR: Installation has failed.  Please see the file
       '/var/log/nvidia-installer.log' for details.  You may find
       suggestions on fixing installation problems in the README available
       on the Linux driver download page at www.nvidia.com.
Installing the CUDA Toolkit in /usr/local/cuda-9.1 ...
Verifying archive integrity... All good.
Uncompressing NVIDIA CUDA.......
Logging to /tmp/cuda-installer-4719
Creating symbolic link /usr/local/cuda -> /usr/local/cuda-9.1
========================================
Please make sure that
 -   PATH includes /usr/local/cuda-9.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.1/lib64, or, add /usr/local/cuda-9.1/lib64 to /etc/ld.so.conf and run ldconfig as root
Please read the release notes in /usr/local/cuda-9.1/doc/
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin
Installation Complete
Verifying archive integrity... All good.
Uncompressing NVIDIA CUDA Samples..
Logging to /tmp/cuda-installer-4827
========================================
Configuring samples Makefile...
========================================
Please make sure that
 -   PATH includes /usr/local/cuda-9.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.1/lib64, or, add /usr/local/cuda-9.1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the NVIDIA CUDA Samples, run the uninstall script in /usr/local/cuda-9.1/samples
Installation Complete
'uninstall_cuda_9.1.pl' -> '/usr/local/cuda-9.1/bin/uninstall_cuda_9.1.pl'
Installing the CUDA Samples in /home/paul ...
Copying samples to /home/paul/NVIDIA_CUDA-9.1_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver:   Installed
Toolkit:  Installed in /usr/local/cuda-9.1
Samples:  Installed in /home/paul
Please make sure that
 -   PATH includes /usr/local/cuda-9.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-9.1/lib64, or, add /usr/local/cuda-9.1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-9.1/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-9.1/doc/pdf for detailed information on setting up CUDA.
Logfile is /tmp/cuda_install_1166.log

But we can see that there is a few errors:

ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 387.26 -k
       ERROR (dkms apport): binary package for nvidia: 387.26 not found
       Error! Bad return status for module build on kernel:
ERROR: Failed to install the kernel module through DKMS. No kernel module
ERROR: Installation has failed.  Please see the file

Is it why NVidia drivers are not loading?

There are apparently some issues with r387 drivers and 4.13 kernel:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1742302

https://devtalk.nvidia.com/default/topic/1000340/cuda-setup-and-installation/-quot-nvidia-smi-has-failed-because-it-couldn-t-communicate-with-the-nvidia-driver-quot-ubuntu-16-04/3

https://devtalk.nvidia.com/default/topic/1028864/cuda-setup-and-installation/cuda-toolkit-installation-error-with-latest-linux-kernel-updates/

You might try the 390.12 beta driver posted here:

http://www.nvidia.com/download/driverResults.aspx/128743/en-us

or wait for further updates (new drivers) to be posted in the r390 branch.

If you choose to use the 390.12 beta driver, then install it first/separately, and when you install the CUDA 9.1 toolkit using the runfile installer, select “no” when prompted to install the driver, or use the command line switches to deselect driver installation.

Ok thank you. Do you have an idea of when it is going to be fixed?

My sense is that 390.12 fixes the issue, based on what I’ve read. I have not personally confirmed that.

Sorry, I’m generally not allowed to make forward-looking comments. This isn’t a forum that NVIDIA generally uses to disclose future information that has not already been disclosed elsewhere.

If you look at the general frequency at which NVIDIA posts drivers, you’ll get some idea for the likely timeframe of a “next” r390 driver posting. I can’t give you specific dates or date range estimates.

I think I’m running into the same type of problem. So no deb based installation works with kernel 4.13? Since they always seem to automatically download 387 driver.

I’m pretty sure 390.12 has been pushed into the online repos.

You could try just using the network deb installer. That may pick up the latest 390 driver.

Or you could try something along the lines of:

sudo apt-get install nvidia-390*
sudo apt-get install cuda-toolkit-9-1

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#package-manager-metas

I get “unable to locate package nvidia-390” when I try to install that driver using “sudo apt-get install nvidia-390”.
What’s the proper way to install the 390 driver?

  1. Read the CUDA linux install guide: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
  2. download the network deb installer http://www.nvidia.com/getcuda
  3. follow steps 1-3 on the installation instructions on the network deb installer page
  4. sudo apt-get install nvidia-390*
  5. sudo apt-get install cuda-drivers
  6. sudo apt-get install cuda-toolkit-9-1

The * in step 3 is important. There are several packages needed there.

As I mentioned in my previous comment, it may be OK just to replace steps 3, 4 and 5 above with:

sudo apt-get install cuda

(following the basic install instructions) because I believe the network cuda package has been updated to pick up nvidia-390 instead of nvidia-387

Thank you for the solution - this solved my issue too.

Could not find ‘nvidia-390*’!!!
$ sudo apt-get install nvidia-390*
Reading package lists… Done
Building dependency tree
Reading state information… Done
E: Unable to locate package nvidia-390*
E: Couldn’t find any package by glob ‘nvidia-390*’
E: Couldn’t find any package by regex ‘nvidia-390*’

It always installs nvidia-387…

$ sudo apt-get install cuda-drivers
[sudo] password for ai:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  bbswitch-dkms dkms lib32gcc1 libc6-i386 libcuda1-387 libjansson4 libvdpau1 libxnvctrl0 mesa-vdpau-drivers nvidia-387 nvidia-387-dev nvidia-modprobe
  nvidia-opencl-icd-387 nvidia-prime nvidia-settings screen-resolution-extra vdpau-driver-all xserver-xorg-legacy
Suggested packages:
  bumblebee libvdpau-va-gl1 nvidia-vdpau-driver nvidia-legacy-340xx-vdpau-driver
The following NEW packages will be installed:
  bbswitch-dkms cuda-drivers dkms lib32gcc1 libc6-i386 libcuda1-387 libjansson4 libvdpau1 libxnvctrl0 mesa-vdpau-drivers nvidia-387 nvidia-387-dev
  nvidia-modprobe nvidia-opencl-icd-387 nvidia-prime nvidia-settings screen-resolution-extra vdpau-driver-all xserver-xorg-legacy
0 upgraded, 19 newly installed, 0 to remove and 3 not upgraded.
Need to get 0 B/86.5 MB of archives.
After this operation, 407 MB of additional disk space will be used.

this is the whole message:

$ sudo apt-get update
Get:1 file:/var/cuda-repo-8-0-local-ga2  InRelease
Ign:1 file:/var/cuda-repo-8-0-local-ga2  InRelease
Get:2 file:/var/cuda-repo-9-1-local  InRelease
Ign:2 file:/var/cuda-repo-9-1-local  InRelease
Get:3 file:/var/cuda-repo-8-0-local-ga2  Release [574 B]
Get:4 file:/var/cuda-repo-9-1-local  Release
Ign:4 file:/var/cuda-repo-9-1-local  Release
Get:5 file:/var/cuda-repo-9-1-local  Packages
Ign:5 file:/var/cuda-repo-9-1-local  Packages
Get:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Ign:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Get:7 file:/var/cuda-repo-9-1-local  Translation-en
Ign:7 file:/var/cuda-repo-9-1-local  Translation-en
Get:3 file:/var/cuda-repo-8-0-local-ga2  Release [574 B]
Get:5 file:/var/cuda-repo-9-1-local  Packages
Ign:5 file:/var/cuda-repo-9-1-local  Packages
Get:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Ign:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Get:7 file:/var/cuda-repo-9-1-local  Translation-en
Ign:7 file:/var/cuda-repo-9-1-local  Translation-en
Get:5 file:/var/cuda-repo-9-1-local  Packages
Ign:5 file:/var/cuda-repo-9-1-local  Packages
Get:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Ign:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Get:7 file:/var/cuda-repo-9-1-local  Translation-en
Ign:7 file:/var/cuda-repo-9-1-local  Translation-en
Get:5 file:/var/cuda-repo-9-1-local  Packages
Ign:5 file:/var/cuda-repo-9-1-local  Packages
Get:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Ign:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Get:7 file:/var/cuda-repo-9-1-local  Translation-en
Ign:7 file:/var/cuda-repo-9-1-local  Translation-en
Get:5 file:/var/cuda-repo-9-1-local  Packages
Ign:5 file:/var/cuda-repo-9-1-local  Packages
Get:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Ign:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Get:7 file:/var/cuda-repo-9-1-local  Translation-en
Ign:7 file:/var/cuda-repo-9-1-local  Translation-en
Get:5 file:/var/cuda-repo-9-1-local  Packages
Err:5 file:/var/cuda-repo-9-1-local  Packages
  File not found - /var/cuda-repo-9-1-local/Packages (2: No such file or directory)
Get:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Ign:6 file:/var/cuda-repo-9-1-local  Translation-en_US
Hit:9 http://storage.googleapis.com/bazel-apt stable InRelease
Ign:10 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  InRelease
Hit:11 http://cn.archive.ubuntu.com/ubuntu xenial InRelease
Get:12 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Hit:13 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release
Get:15 http://cn.archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Get:16 http://cn.archive.ubuntu.com/ubuntu xenial-backports InRelease [102 kB]
Get:17 http://security.ubuntu.com/ubuntu xenial-security/main amd64 DEP-11 Metadata [62.7 kB]
Get:18 http://cn.archive.ubuntu.com/ubuntu xenial-updates/main amd64 DEP-11 Metadata [307 kB]
Get:19 http://security.ubuntu.com/ubuntu xenial-security/main DEP-11 64x64 Icons [64.5 kB]
Get:20 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 DEP-11 Metadata [51.2 kB]
Get:21 http://security.ubuntu.com/ubuntu xenial-security/universe DEP-11 64x64 Icons [75.7 kB]
Get:22 http://cn.archive.ubuntu.com/ubuntu xenial-updates/main DEP-11 64x64 Icons [221 kB]
Get:23 http://cn.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 DEP-11 Metadata [190 kB]
Get:24 http://cn.archive.ubuntu.com/ubuntu xenial-updates/universe DEP-11 64x64 Icons [265 kB]
Get:25 http://cn.archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 DEP-11 Metadata [5,892 B]
Get:26 http://cn.archive.ubuntu.com/ubuntu xenial-backports/main amd64 DEP-11 Metadata [3,328 B]
Get:27 http://cn.archive.ubuntu.com/ubuntu xenial-backports/universe amd64 DEP-11 Metadata [4,712 B]
Fetched 1,559 kB in 5s (287 kB/s)
Reading package lists... Done
W: The repository 'file:/var/cuda-repo-9-1-local  Release' does not have a Release file.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Failed to fetch file:/var/cuda-repo-9-1-local/Packages  File not found - /var/cuda-repo-9-1-local/Packages (2: No such file or directory)
E: Some index files failed to download. They have been ignored, or old ones used instead.
ai@ai-server:~/Nvidia/cuda-9.1$ sudo apt-get install nvidia-390*
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package nvidia-390*
E: Couldn't find any package by glob 'nvidia-390*'
E: Couldn't find any package by regex 'nvidia-390*'

To use this method, at this time, be sure to use the “network” deb installer, not the “local” deb installer.

@txbob:
Is there any info on the status on ppc64le R390 drivers? I can’t seem to find the 390 and 387 is failing install for me on:
4.14.14-300.fc27.ppc64le

I had to install the 390 driver via runfil method, then install cuda toolkit 9.1 using .deb file, on Ubuntu 16.04. The trick is to use

sudo apt-get install cuda-toolkit-9-1

instead of

sudo apt-get install cuda

because the latter installs the 367 driver, which interferes with the 390 install, and nothing works anymore.

Thanks Everyone. After a lot of struggle this thread helped me install CUDA 9.1 successfully on Ubuntu 16.04 :). I have a PNY GE-FORCE 730 graphics card.

I’m summarizing all of steps that I followed to get this working:

A. Read the CUDA Linux install guide: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
B. Download the network deb installer http://www.nvidia.com/getcuda
C. Follow steps 1-3 on the installation instructions on the network deb installer page

Base Installer Installation Instructions (steps 1-3 below):
sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
sudo apt-get update

D. sudo apt-get install nvidia-390* (NOTE: The * in step D is important)
E. sudo apt-get install cuda-drivers
f. sudo apt-get install cuda-toolkit-9-1

Post-Installation steps:

The PATH variable needs to include /usr/local/cuda-9.1/bin (add this to the ${HOME}/.profile or .bashrc)

To add this path to the PATH variable:

export PATH=/usr/local/cuda-9.1/bin{PATH:+:${PATH}}

In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to

contain /usr/local/cuda-9.1/lib64 on a 64-bit system, or /usr/local/cuda-9.1/lib on a 32-bit system

To change the environment variables for 64-bit operating systems:

export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64\ {LD_LIBRARY_PATH:+:{LD_LIBRARY_PATH}} To change the environment variables for 32-bit operating systems: export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib
{LD_LIBRARY_PATH:+:{LD_LIBRARY_PATH}}

********** Reboot the machine using “sudo shutdown -r now” ********************

The NVIDIA Persistence Daemon can be started as the root user by running:

This command should be run on boot. Consult your Linux distribution’s init

documentation for details on how to automate this.

The NVIDIA Persistence Daemon can be started as the root user by running:

sudo /usr/bin/nvidia-persistenced --verbose [sudo] password for xyz: ps -ef | grep nvidia
root 156 2 0 16:24 ? 00:00:00 [nvidia-modeset]
nvidia-+ 882 1 0 16:24 ? 00:00:00 /usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose
root 883 2 0 16:24 ? 00:00:01 [irq/29-nvidia]
root 884 2 0 16:24 ? 00:00:00 [nvidia]
xyz 2303 2052 0 16:29 pts/6 00:00:00 grep --color=auto nvidia

Check Nvidia Driver version

$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.30 Wed Jan 31 22:08:49 PST 2018
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)