Newly installed drivers are not found when nvidia-smi is called.

I am trying to set up a new machine and the machine is not seeing the newly installed drivers.
nvidia-bug-report.log.gz (119 KB)

please try this:

  • install nvidia-prime (sudo apt install nvidia-prime)
  • switch to nvidia (sudo prime-select nvidia)
  • remove stray blacklist files (sudo rm /lib/modprobe.d/blacklist-nvidia.conf /etc/modprobe.d/blacklist-nvidia.conf)
  • update the initrd (sudo update-initramfs -u)
  • reboot

Hello,

I have a very similar problem, the NVIDIA drivers cannot be loaded.
I’m using Ubuntu 18.04 with a GeForce GTX 1060 Mobile NVIDIA card.

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ nvidia-settings 
ERROR: NVIDIA driver is not loaded
ERROR: Unable to load info from any available system

What I tried so far:

Switch the nvidia profile

$ sudo prime-select nvidia
Info: the nvidia profile is already set

(Re)move blacklisted files

$ sudo mv /lib/modprobe.d/blacklist-nvidia.conf ~/
$ sudo mv /etc/modprobe.d/blacklist-nvidia.conf ~/

Update the initrd

$ sudo update-initramfs -u

Reboot

However, I still get

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

This is probably due to the fact that I cannot disable secure boot. When I disable secure boot in the BIOS, Ubuntu cannot boot. It freezes during boot, showing this error:

[FAILED] Failed to start NVIDIA Persistence Daemon. 
See 'systemctl status nvidia-persistenced.service' for details

Here is some information:

$ systemctl status nvidia-persistenced.service
● nvidia-persistenced.service - NVIDIA Persistence Daemon
   Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Fri 2019-10-04 11:07:28 IST; 5min ago
  Process: 1168 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced (code=exited, status=0/SUCCESS)
  Process: 1042 ExecStart=/usr/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose (code=exited, status=1/FAILURE)

Oct 04 11:07:28 pierre-G5 nvidia-persistenced[1052]: Started (1052)
Oct 04 11:07:28 pierre-G5 nvidia-persistenced[1052]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 122 has read and write permissions for t
Oct 04 11:07:28 pierre-G5 nvidia-persistenced[1052]: PID file unlocked.
Oct 04 11:07:28 pierre-G5 nvidia-persistenced[1042]: nvidia-persistenced failed to initialize. Check syslog for more details.
Oct 04 11:07:28 pierre-G5 nvidia-persistenced[1052]: PID file closed.
Oct 04 11:07:28 pierre-G5 nvidia-persistenced[1052]: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Oct 04 11:07:28 pierre-G5 nvidia-persistenced[1052]: Shutdown (1052)
Oct 04 11:07:28 pierre-G5 systemd[1]: nvidia-persistenced.service: Control process exited, code=exited status=1
Oct 04 11:07:28 pierre-G5 systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
Oct 04 11:07:28 pierre-G5 systemd[1]: Failed to start NVIDIA Persistence Daemon.

The following files don’t exit, so that’s why the NVIDIA Persistence Daemon cannot start:

/dev/nvidia*
/var/run/nvidia-persistenced

Any suggestions? Many thanks in advance.

nvidia-bug-report.log.gz (112 KB)

You have secure boot enabled, please disable it in bios.

Yes I know that secure boot is enabled but when I disable it, Ubuntu cannot boot because the NVIDIA Persistence Daemon cannot start (see my post above).

Does purging/reinstalling the nvidia driver after disabling secure boot help? Otherwise, please create a new nvidia-bug-report.log with secure boot disabled.

Brilliant! Uninstalling the Nvidia driver worked. Then I was able to disable secure boot. Then I reinstalled the Nvidia driver.

Here’s what I did:

Uninstalling NVIDIA driver

sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get install ubuntu-desktop
sudo apt-get --purge remove "*cublas*" "cuda*"
sudo apt-get --purge remove "*nvidia*"
sudo add-apt-repository --remove ppa:graphics-drivers/ppa
sudo rm /etc/X11/xorg.conf
sudo apt autoremove
sudo reboot

Disable secure boot.

Reinstalling the driver

sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall
sudo reboot
$ nvidia-smi
Fri Oct  4 13:53:43 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   59C    P0    26W /  N/A |    713MiB /  6078MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1606      G   /usr/lib/xorg/Xorg                            98MiB |
|    0      1958      G   /usr/bin/gnome-shell                          52MiB |
|    0      2585      G   /usr/lib/xorg/Xorg                           391MiB |
|    0      2797      G   /usr/bin/gnome-shell                         105MiB |
|    0      3785      G   ...quest-channel-token=4785359086748578459    62MiB |
+-----------------------------------------------------------------------------+

Many thanks Generix!

2 Likes

I disabled secure boot prior to posting my original query. I appreciate all the help everyone has replied with, but so far nothing has worked.

Looks like you installed the driver using the .run installer. Please uninstall it using the --uninstall option, then install the driver using Ubuntu’s software&updates application.

My issue was a bit different from the one below, but IT SAVED ME. In my case, after I had uninstalled everything using the direction below, I was doing:

sudo ubuntu-drivers autoinstall

but I wasn’t focused on the messages and the driver build was failing. When the build fails it will write a build log into your /var/lib/dkms/nvidia/… directory and that for me was the answer. I had screwed up where my CC compiler was pointing. I also add the newer versions of GCC, so that may have helped as well (mentioned in another post).

1 Like

Thank you so much !!! This solved my problem after days of uninstall / install cycles.