Nvidia-driver-460 on Ubuntu 20.04: NVIDIA driver is not loaded

On a fresh version of ubuntu 20.04, kernel 5.4.0-66-generic, I installed the NVIDIA drivers for GeForce 940M via
sudo ubuntu-drivers autoinstall
The nvidia-driver-460 was installed. After reboot I noticed that the CPU is extremely busy due to the systemd-udevd process, which seemingly tries repeatedly to load the nvidia driver without success. Output of nvidia-smi:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Output of nvidia-settings:
ERROR: NVIDIA driver is not loaded
ERROR: Unable to load info from any available system

Thank you in advance!
nvidia-bug-report.log.gz (618.4 KB)

There’s a blacklist file missing. Please create
/etc/modprobe.d/nvidiafb-blacklist.conf

blacklist nvidiafb
blacklist nouveau

Then run
sudo update-initramfs -u
and reboot.

1 Like

Done it. BUT now after reboot, the screen freezes to a black screen before the login/authentication, and the only way out is a forced shut down, as Ctrl+alt+F2 does nothing.
I managed to login in terminal mode by editing the grub at boot, but even in this case it gets stuck in this error messages:
PCIe Bus Error: severity=Corrected, type0Physical Layer, (Receiver ID)
device [8086:9d15] error status/ask=00000001/00002000
[ 0] RxErr
However after this, Ctrl + Alt + F2 works, and I can login in terminal mode.

Try setting kernel parameter
pci=noaer

Sorry, same exact situation as in the previous message.

Please create a new nvidia-bug-report.log

Here it is. Of course this was generated while in terminal mode (after logging in via Ctrl+Alt+F2).
nvidia-bug-report.log.gz (274.9 KB)

You get massive slowdown during boot right after the driver for the sd-cardreader loads. Can you possibly disable that in bios to check if that’s the reason?

I am not sure to be able to do it, any code hint about that?

Looks like your bios doesn’t have an option for it. Try blacklisting it, create
/etc/modprobe.d/realtek-sd.conf

blacklist rtsx_usb

and run
sudo update-initramfs -u

Apparently nothing changed, but just in case I produced a new bug report.
nvidia-bug-report.log.gz (266.6 KB)

The touchpad driver also has an issue, fixed in later kernel versions. Please run a general system upgrade, should take you to kernel 5.8. If not, run
sudo apt install --install-recommends linux-generic-hwe-20.04

1 Like

This was a really good step, as now the booting goes normally and I can login graphically. However, the nvidia-driver is still not loading. Error messages by nvidia-smi and nvidia-settings are like in my original first message. New bug-report:

nvidia-bug-report.log.gz (92.3 KB)

You initially installed the nvidia driver using the runfile installer without dkms so it doesn’t survive kernel changes. This is not recommended, please uninstall it using the --uninstall option, then install the nvidia driver from ubuntu repo using the Software&Updates application, possibly enabling the graphics drivers ppa.

I followed your instructions, but now I am back in the situation of my second message: black screen before login, able to login only in terminal mode by editing grub, PCIe Bus Error, etc.
New bug report (generated while in terminal mode):
nvidia-bug-report.log.gz (269.9 KB)

Please delete /etc/X11/xorg.conf

Now I am able to login in graphical mode, but with the screen situation that can be seen in the attached screenshot (unusable screen).
However, nvidia-smi and nvidia-settings produce error-free outputs.
New bug report also attached.
nvidia-bug-report.log.gz (310.1 KB)

That rather looks like defective video memory. You might want to run cuda memtest to check it.

On which application should I run the cuda-memcheck?

Not memcheck, memtest:
https://sourceforge.net/projects/cudagpumemtest/