NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Running Ubuntu 20.04 with a NVIDIA GTX 1080 card. System was running fine up until yesterday when I had to reboot it due to power failure.

Tried to connect to an external monitor but found that my system could not detect it.

Ran nvidia-smi and got the error:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Checked the nvidia-persistenced.service with sudo systemctl status nvidia-persistenced.service and got:
● nvidia-persistenced.service - NVIDIA Persistence Daemon
Loaded: loaded (/lib/systemd/system/nvidia-persistenced.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Mon 2024-05-27 14:00:04 CDT; 34min ago
Process: 10553 ExecStart=/usr/bin/nvidia-persistenced --verbose (code=exited, status=1/FAILURE)
Process: 10556 ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced/ (code=exited, status=0/SUCCESS)*

May 27 14:00:04 gatame76 systemd[1]: nvidia-persistenced.service: Scheduled restart job, restart counter is at 5.
May 27 14:00:04 gatame76 systemd[1]: Stopped NVIDIA Persistence Daemon.
May 27 14:00:04 gatame76 systemd[1]: nvidia-persistenced.service: Start request repeated too quickly.
May 27 14:00:04 gatame76 systemd[1]: nvidia-persistenced.service: Failed with result ‘exit-code’.
May 27 14:00:04 gatame76 systemd[1]: Failed to start NVIDIA Persistence Daemon.

Assuming that when I sudo apt upgraded other packages on my system they broke my nvidia/cuda installation.

Any suggestions on how to best fix this?

I double checked that I am not in secure boot mode.

Attaching the nvidia-bug-report.log.gz file
nvidia-bug-report.log.gz (436.4 KB)

Hi there @TR2050499 and welcome to the NVIDIA developer forums.

I recommend looking at the Linux specific subcategory here in the forums, there is a lot of helpful information for similar issues around.

That said, the error here is
[drm] Failed to open DRM device for pci:0000:01:00.0: -19
This PCI device is your mobile GPU. Always helpful to know that this is a laptop system, not a desktop, which behaves differently because of the built-in GPU and BIOS settings.

In any case, given this message:

warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
  You are using:           cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

I suspect something rebuilt a kernel dependency during the update with different compiler parameters and broke compatibility.

Best thing to do is purge, reboot and reinstall the NVIDIA driver. Ideally using the “Additional Software” path of Ubuntu.

Thanks!

Thanks for the response Markus!

Any advice on how to safely purge and reinstall the NVIDIA driver from my system (I’ve done this before but remember there being more than a few steps)? Should I also purge and reinstall the NVIDIA CUDA toolkit?

One way is to use

sudo apt-get remove --purge '^nvidia-.*'

then reboot, then reinstall. To be safe, do the same for CUDA related installation.

Then if you want to get a proper CUDA installation, you might want to consider following CUDA installation instructions to avoid any kind of driver mismatches.