X has been freezing a lot lately, almost everyday at least once. The screen just freezes completely, the cursor can move but the system does not respond to keyboard input (apart from SysRq, I can force it to reboot, but it won’t switch tty even after SysRq+r).
Also the system does not seem to recover, because simply waiting doesn’t seem to help.
[ 6270.576] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x0000fc50, 0x0000fc6c) [ 6277.576] (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x0000fc50, 0x0000fc6c)
(Full xorg.log.old: https://paste.pound-python.org/show/mngWuMPLALEJ0XoYmEer/)
dmesg or syslog don’t seem to report any thing out of the ordinary as far as I can tell, today’s syslog: https://paste.pound-python.org/show/cBhNiWAMSQwcau1XIQ77/
Anyone got any idea what’s going on here?
Is this maybe a driver bug, or is my GPU dying?
(Emerge --info: https://paste.pound-python.org/show/3cxt0UERiLWimv3QqYYS/)
When I ssh into the machine and try to reload the nvidia, nvidia_drm, nvidia_modeset modules, the command to rmmmod -f nvidia never completes, it then refuses to shutdown. Here’s a syslog: https://paste.pound-python.org/show/iWv1yZjShTSXp8nqHWON/
And some interesting lines from this log:
Feb 11 09:56:10 andrew-gentoo-pc kernel: [ 2151.026289] NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
Feb 11 09:56:20 andrew-gentoo-pc kernel: [ 2161.122295] WARNING: CPU: 11 PID: 5556 at /var/tmp/portage/x11-drivers/nvidia-drivers-418.30/work/kernel/nvidia/nv-rsync.c:44 nv_destroy_rsync_info+0x25/0x30 [nvidia]
Sddm fails to stop X:
[10:04:03.092] (WW) DAEMON: Signal received: SIGTERM [10:04:03.092] (II) DAEMON: Socket server stopping... [10:04:03.092] (II) DAEMON: Socket server stopped. [10:04:03.092] (II) DAEMON: Display server stopping... [10:04:08.097] (WW) DAEMON: QProcess: Destroyed while process ("/usr/libexec/sddm-helper") is still running. [10:04:08.098] (II) DAEMON: Display server stopping... [10:04:13.103] (WW) DAEMON: QProcess: Destroyed while process ("/usr/bin/X") is still running.
Also. maybe I should add that I am using the boot parameter “nvidia-drm.modeset=1” because nvidia’s documentation claims that this eliminates/reduces tearing (and enables frame synchronization between the GPU’s I think), however I still see tearing sometimes, but only on the monitor connect to the nvidia gpu.
I also have the following in /usr/share/sddm/scripts/Xsetup:
xrandr --setprovideroutputsource modesetting NVIDIA-0 xrandr --auto --output HDMI-1-2 --mode 1600x900 --pos 3360x90 --output DVI-D-0 --mode 1920x1080 --pos 1440x0 --output DP-1-2 --mode 1440x900 --pos 0x90
The first line enables PRIME, nvidia’s documentation has “xrandr --auto” as the second line.
However, when I use the --auto option KDE completely messes up the monitor configuration (even though sddm detects it just fine), all monitors are put over each other in a configuration that resembles the duplication configuration but is not quite the same because the resolutions do match. It used to work fine with 2 monitors, however ever since I added the third I need to manually specify the correct configuration.
I do not have efifb enabled because when I enable it, I get a low resolution framebuffer on the monitor connected to the nvidia GPU, and no framebuffer on the monitors connected to the intel GPU.
Instead I have it disabled which gives me no framebuffer on the monitor connected to the nvidia GPU, but it does give me a high-resolution framebuffer on the monitors connected to the intel GPU.
I have also had problems with the HDMI output of the nivida GPU (see my other thread here: [SOLVED]Problems with nvidia-drivers and 2nd monitor on IGPU)
The monitor would slowly become completely white when: I logged in from sddm, I switched from tty to X, or whenever the monitor configuration changed.
I have not had this problem ever since I have been using the DVI-D output instead, indicating that this was not a problem with the monitor, but with the GPU.
See also my thread on the gentoo forums I tried to copy most of it here but I might have missed something.
nvidia-bug-report.log.gz (104 KB)