NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver after updating Ubuntu 20.04

I have an Ubuntu 20.04 system which has been working perfectly with an nvidia 1080Ti card.

I recently allowed the software updater on ubuntu to install some pending updates and my nvidia card stopped working.

I did not upgrade to ubuntu-20.04 from a previous version of ubuntu. This system has been running ubuntu-20.04 for a while with no issues, and I just allowed some updates to be installed.

Only one of my three attached monitors works now and when I run the nvidia-settings app, I get a small blank screen instead of seeing the nvidia settings listed.

I think this is because nvidia driver is not loading anymore :

$ sudo prime-select query
nvidia

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Here is some additional information which may assist in diagnosing what is wrong:

$ dkms status
nvidia, 460.39, 5.8.0-45-generic, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
virtualbox, 6.1.16, 5.8.0-44-generic, x86_64: installed
virtualbox, 6.1.16, 5.8.0-45-generic, x86_64: installed

$ grep -r nvidia /etc/modprobe.d/* /lib/modprobe.d/*
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb

 uname -a
Linux shed-ubuntu 5.8.0-45-generic #51~20.04.1-Ubuntu SMP Tue Feb 23 13:46:31 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I am attaching debug logs:
nvidia-bug-report.log.gz (106.1 KB)

1 Like

Please run

grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*

to find a file containing

blacklist nvidia

and remove it,
then run

sudo update-initramfs -u

and reboot.

3 Likes

Thank you for the quick response. I showed the output of grep nvidia /etc/modprobe.d/* /lib/modprobe.d/* in the original post.

This is what the grep finds: /etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb

Is it correct to remove (or comment out) that entry, update the ramfs and reboot?

That file has to stay where it is.
The driver is installed and avalable, it just doesn’t load. Please try updating the initrd
sudo update-initramfs -u
Can you manually load the driver?
sudo modprobe nvidia

I tried:

sudo update-initramfs -u
sync
reboot

and it had no effect (no difference in behavior).

I then tried:

$ sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': Key was rejected by service

Please removing and adding the modules:

sudo dkms remove nvidia/460.39 --all
sudo dkms install nvidia/460.39 -k $(uname -r)
sudo update-initramfs -u
1 Like

@generix - thank you very much for your help. Your instructions above for removing and re-installing the nvidia kernel modules worked for me.

I did make one change to the commands you cited above.
When I ran the dkms install command, I got quite a few messages like this:

Good news! Module version 460.39 for nvidia.ko
exactly matches what is already found in kernel 5.8.0-45-generic.
DKMS will not replace this module.
You may override by specifying --force.

So I decided to install the kernel modules using the --force option to make sure they got updated. That seemed to work. Here are the commands that I ran:

sudo dkms remove nvidia/460.39 --all
sudo dkms install --force nvidia/460.39 -k $(uname -r)
sudo update-initramfs -u
sync
reboot

Thank you again for all of your help!

I have have a similar problem running on a VPX PC and the WOLF P5000.

$sudo prime-select query
nvidia

$nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

$ dkms status
nvidia, 465.27, 5.8.0-43-generic, x86_64: installed
nvidia, 465.27, 5.8.0-53-generic, x86_64: installed

$ dmesg
[ 1272.612381] NVRM: None of the NVIDIA devices were initialized.
[ 1272.612824] nvidia-nvlink: Unregistered the Nvlink Core, major device number 511
[ 1273.231616] nvidia-nvlink: Nvlink Core is being initialized, major device number 511
[ 1273.232286] NVRM: request_mem_region failed for 0M @ 0x0. This can
NVRM: occur when a driver such as rivatv is loaded and claims
NVRM: ownership of the device’s registers.
[ 1273.232290] nvidia: probe of 0000:04:00.0 failed with error -1
[ 1273.232307] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 1273.232308] NVRM: None of the NVIDIA devices were initialized.

$ grep -r nvidia /etc/modprobe.d
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb

$uname -a
Linux tre-TR-E8x-msd 5.8.0-53-generic #60~20.04.1-Ubuntu SMP Thu May 6 09:52:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I am attaching debug logs:

nvidia-bug-report.log.gz (1.3 MB)

Hey all.

When I run nvidia-smi, I get the following output:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

dkms status:

nvidia, 470.57.02: added

nvidia-settings:

ERROR: NVIDIA driver is not loaded
ERROR: Unable to load info from any available system

The NVIDIA x server settings window which opens after the command above is empty.

Here’s a bug report: nvidia-bug-report.log.gz (95.9 KB)

I’m using Ubuntu 20.04 LTS. I needed to update the kernel to 5.13.7 because the touchpad wasn’t working properly on 5.8. This update might have caused the problem I’m facing right now.

Any advice on how to fix the problem? Thanks!

I have the same problem with exact same outputs. Please help!

I disabled the Secure Boot on my machine and now nvidia-smi works.

2 Likes

Wow, it’s work for me, Thanks @tolkjenkot

Hi there, I ran into this problem recently and found your solution very useful. Thank you very much! :)

Thanks a lot, I used dual boot with Win11, and the secure boot is the problem. Unable it solve the error

thanks

I am facing the same issue and was not able to fix this, but worked after disabling the secure boot