Version 460 nvidia driver stopped loading overnight under ubuntu 20.04

Yesterday nvidia drivers version 460 were running correctly on my ubuntu 20.04 system with three GTX1070 gpus. Sometime in the early morning hours, the drivers stopped working. I suspect that may have been due to the installation of some kind of system update, but I don’t know what. When I realized that the drivers didn’t appear to be working this morning, I first tried simply rebooting the system, then I tried power it down and turning it back on. Next I started a terminal window and tried running nvidia-smi, and nvidia-smi reported that it could not communicate with the nvidia driver.

I tried resolving that problem by using the ubuntu Software & Updates app to revert to version 450 (no help), then back to version 460 (no help), then to the nouveau driver and back to version 460 (still no help). Then did some googling and based on another thread on this site tried:

sudo apt install nvidia-prime (latest version already installed)
sudo prime-select nvidia (nvidia profile already selected)
sudo ls /lib/modprobe.d/blacklist-nvidia.conf (not found)
sudo update-initramfs -u
sudo reboot

Unfortunately, things got worse after that, and the system would no longer boot using the default grub selection. By switching to a prior kernel (5.8.0-49) via grub, the system will boot ok, but the nvidia drivers still don’t appear to be loading. Kernel version 5.8.0.50 is the latest (and default) kernel version installed on my system, but I don’t know when that version was installed and what kernel version(s) were being used over the past week or two since building this system and installing ubuntu, while everything was working correctly.

I next tried adding nvidia-drm.modeset=1 on the end of the GRUB_CMDLINE_LINUX_DEFAULT line in /etc/default/grub, followed by sudo update-grub. But update-grub complains that nvidia-drm.modeset=1 is not found when sourcing /etc/default/grub. Then I tried manually executing sudo nvidia-drm, and that command was not found.

I can run the nvidia-bug-report.sh script if desired, however I wasn’t certain that would be helpful since I am unable to boot my system using the default kernel, so the configuration won’t be the same as for the kernel that is currently booted. I am a linux/ubuntu novice, and really need to get that system back up and running properly, so any help in resolving these issues would be greatly appreciated.

Re-checked for blacklist again using grep, and found this:
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb

Should that be deleted? The whole file, or just that single line?

1 Like

Uploading the nvidia-bug-report.log.gz file…nvidia-bug-report.log.gz (96.0 KB)

Not sure why my post resulted in crickets when other messages to the forum were actively receiving responses. But in any case, I was eventually able to figure out the basic problem and correct it myself. However, I still don’t understand what could have caused these problems in the first place.

After the steps outlined in my prior messages, I next tried scanning through the nvidia-bug-report.log file, and noticed a couple of strange things. First, /var.log/nvidia-installer.log did not exist, and second, LoadModule reported that it could not find the nvidia module, even though I had repeatedly run the install process for different driver versions in several different ways with no obvious error messages. That caused me to use apt and dpkg to try to purge all of the nvidia driver related packages. At first that was partially failing because it didn’t like something about nvidia-drm.modeset=1 in /etc/default/grub. So, I removed that and was then able to finish the purges more or less completely. Next:

sudo apt-get install ubuntu-desktop

because I understand that ubuntu-desktop has a dependency on some of the nvidia modules that were purged. Then:

sudo update-initramfs -k all -u
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt install nvidia-driver-460
sudo reboot

That installed the driver ok for the older kernel version (5.8.0.49) that I was running temporarily because the latest kernel (5.8.0.50) wasn’t booting, but failed to build the driver for 5.8.0.50, stating that the kernel headers were missing. I have no idea how or why the kernel header files could have gone missing. My best guess that that an automatic update process installed the most recent version of the kernel, and for some unknown reason failed to install the headers along with it. So at that point:

sudo apt-get install linux-headers-5.8.0.50

For some reason, installing the kernel headers also automatically built the nvidia drivers for that kernel version as well! After yet another reboot (probably the 50th reboot that I’ve done that today while trying to figure this out), the system finally rebooted properly using the default (latest) kernel, and the nvidia drivers loaded and appear to be working correctly.

Wow, thanks a lot for troubleshooting this and posting your solution! I’m sorry that so many people completely ignored your issue.

It seems I’m having the exact same issue after a “sudo apt update && sudo apt upgrade” yesterday. I started my system up today, only to notice I was frozen at the Ubuntu login screen.

I tried purging Nvidia, rebooting, and installing nvidia-driver-360, to no avail. I then tried the 2nd to last version of Ubuntu, and I was able to login and get to the GUI, but it froze instantly after loading the GUI.

I’m about to try your further actions now.

Glad you found that useful, and hope you are able to get your problem straightened out. One thing: Thought I read somewhere that driver versions older than somewhere in the 380 to 390 range have compatibility issues with recent Linux kernel versions. So you might want to consider trying a more recent driver version. However, I’m not very certain about that at all, and could be completely mistaken. So please take all of that with a big grain of salt.

Good luck,
Kevin M.

Ahhhh, thanks for the reply. You’re a lifesaver. I mistyped there; I meant to type nvidia-driver-460.

There’s something broken in the standard ubuntu repo, seems it’s trying to pull a removed driver package on upgrade so you end up with no driver. Just using the driver ppa is a valid work-around.

Thanks for the explanation, that makes sense. However, that wouldn’t seem to explain why the Linux headers would be missing…? Perhaps there is more missing in the repo than just the nvidia driver? I wonder if the ubuntu folks are aware of these problems. Surely I’m not the only one who ran into these issues…

Installation already stops in the download phase, so nothing gets installed. If nothing else but the nvidia driver requires the headers, they won’t get installed either.

Ah, got it, thanks.