Hi all,
after having spent four days of trying virtually every single idea and suggestion that I came across, I now admit to being utterly and totally dumbfounded by the problem that I’m facing. I would appreciate, if somebody could point me in the right direction. My problem is as follows:
I’m currently setting up a new computer with the following specs:
- Mainboard: Mag B560 Tomahawk WIFI by MSI
- Graphics card: GeForce RTX 3060 Ti by KFA2
- CPU: Intel i7-11700KF
- Three screens, two of which are connected via DisplayPort, one via HDMI.
- Secure Boot is disabled. Fast Boot is disabled.
I installed the latest version of KUbuntu Linux (Jammy Jellyfish 22.04, as of this writing) and initially managed to get the system running more or less directly out-of-the-box. However, after having installed a few updates and lots of programs, the problems started:
In a nutshell, it has now turned out to be virtually impossible to get the system to boot, as it always freezes before switching to the graphical login screen. From the beginning I suspected the NVidia drivers to be at least part of the problem and started to boot by removing quiet splash
from the GRUB entry and added noplymouth nomodeset
to better understand what’s happening. I then noticed that the boot process mostly froze, when the USB ports where queried and managed to locate a problem with one of my tablets. By moving this from a USB 2 to a USB 3 port, the problem went away, i.e. the boot process no longer stops there and runs roughly until the graphical login screen should appear.
I then read about a bug related to having screens connected to a DisplayPorts in some versions of the NVidia driver (sorry, I’m not allowed to post more than one link here). However, the versions mentioned in these threads don’t seem to be relevant for my card anymore. What’s more, I also tried booting with only one monitor plugged into a HDMI port, without any luck.
I then proceeded to trying different versions of the driver with varying results:
- 450-server (from the repositories, mentioned in one of the threads above): resulted in lots of errors “Failed to start nvidia persistence daemon”.
- 460.84 (from the NVidia site): fails to compile
- 470.57 (from the NVidia site): fails to compile
- 470.63 (from the NVidia site): freezes before switching to the graphical login screen
- 470.86 (from the NVidia site): freezes before switching to the graphical login screen
- 470.94 (from the NVidia site): freezes before switching to the graphical login screen
- 470.103 (from the repositories): freezes before switching to the graphical login screen
- 510.60 (both from the repositories and from the NVidia site): freezes before switching to the graphical login screen.
In most cases, I managed to at least reach a console by booting to run level 3 (adding 3
to GRUB), where I could then uninstall the driver. However, in some cases, I had to explicitly blacklist the NVidia drivers (adding module_blacklist=nvidia
to GRUB): In general, drivers with versions 510.xx seem to cause this more often, but I can’t quite put my finger on the problems.
I then tried to boot an older kernel and switched from 5.15.0-27 to 5.15.0-25 (no other kernel is available). At first, this seemed to improve things, as I managed to get the system to boot using drivers versions 470.63 and onwards. Alas, my joy was shortlived, as the problems came back. I now seem to be able to boot to a working system once every, say, 50 times. In these cases, however, only one of my screens is detected.
That made me suspect it might have something to do with timing. I came across a thread that suggested to “load the Nvidia kernel modules a little earlier in the boot process” by modifying /etc/modules
. But this didn’t seem to have any effect in my case.
The fact that only one of my three monitors was detected, whenever I was able to boot, made me suspect that the driver didn’t even get loaded. And indeed: I now noticed “(EE) Failed to load module nvidia (module does not exist, 0)” in /var/logs/Xorg.0.log
, so I searched for the driver and found it was installed in /usr/lib/x86_64-linux-gnu/nvidia/xorg
. At that point I had already installed, purged and reinstalled different versions of the drivers often times, so I wondered, why none of these steps had set the correct ModulePath. Nonetheless, I manually added the line ModulePath "/usr/lib/x86_64-linux-gnu/nvidia/xorg,/usr/lib/xorg/modules"
to /etc/X11/xorg.conf
and managed to at least get the driver to load during boot. Unfortunately, I now get the error "(EE) NVIDIA: Failed to initialize the NVIDIA kernel module"
. Checking dmesg
doesn’t contain any helpful hints.
Lastly, in my desperation I tried a simple sudo ubuntu-drivers autoinstall
(that installed driver version 510.60). The system first booted into the latest kernel (-27, see above), which did not work, but switching back to the previous kernel version (-25) the system immediately booted with everything working perfectly. However, after only a single reboot, the problems came back and now even the autoinstall doesn’t have any effect anymore.
By the way:
- I made sure the
nouveau
driver is blacklisted. - I tried to recreate the file
xorg.conf
usingsudo nvidia-xconfig
often and finally even deleted the whole directory/etc/X11
. None of this has any effect.
I’ve now run out of ideas and I’d be happy about all hints and suggestions to get my system up and running.
Thanks in advance.
nvidia-bug-report.log.gz (69.2 KB)