Nvidia and Arch Linux Problems, Black Screen of Death.

I’ve recently had issues with my desktop unable to get to a desktop after a good 2.5 months of uptime (as good as it gets with linux these days, almost windoze-ish), and mostly it seems to be an nvidia issue.

Short of it, my desktop crashed (kde) and never came back one day. I get to SDDM to log in and start kde, then my desktop goes black, and desktop never starts. KDE is at best cantankerous between releases, so I figured KDE team just screwed up again. After harping on their dev team, they seem to think it’s an nvidia issue, and I’m prone to concur.

I’d been troubleshooting KDE for a while, as it’s an unstable mess for on KDE with a very large desktop (11520x2160, or 3x 4k displays), but it seems the driver somehow decided to become unstable as part of the process. It finally crashed after around 45 days of uptime, which is usually pretty good for KDE these days, but this time, it wouldn’t come up back to desktop.

Looking at kwin debug output from qdbus, prior my desktop would start normally using normal GL and GLX drivers, but now reviewing after the crash, it seems when I start SDDM and launch to a desktop where it hangs, it’s starting it EGL mode with GL ES 2.0 drivers instead of the normal linux GL drivers, and hangs accordingly.

I was on kernel 5.2.2 and nvidia 430.43-1 at the start of this. I upgraded, as I’ve had through just installing random packages arch break itself upgrading this without upgrading that, but saw no change moving to 5.3.1 kernel and 435.21. Rolling back didn’t help, so was at a loss what might have happened.

I don’t know what is causing kde, or any other windowing system I have (cinnamon, mate, gnome) to break on here, but none boot with this, as systemically the kernel is loading the wrong driver mode and breaking everything.

I’ve tried adding kernel/grub flags to things as recommended (though many I’ve not needed), but it’s still broken. I’ve been forced to use my laptop for the past month as a result (arch too, though I dare not upgrade/install anything), but really would like to know if anyone else has an idea on this.

Here’s a links to qdbus info, xorg logs, and nvidia bug reports full and normal. I have no idea, other than to curse nvidia, and get some amd-based card that seems to work fine with OSS driver as I was using prior. I’ve not had issues with this upgrading in arch for 3 years on this system, just suddenly all hell broke loose.

https://drive.google.com/drive/folders/1C-QcZQFeRZwchzYvvsLWFX9f7EW_ohB8?usp=sharing

-mb

According to the logs, the nvidia driver loads fine and xorg and kwin are running on the gpu. No current xorg logs are included.
Please post the output of
ps aux |grep X
Did you ever change the DM to something other than sddm?

The Xorg.0.log files were attached, a before working version, and after, once things broke. These are still valid, as I’ve not had a working desktop in a month, and the “before” was taken right before that broke. I agree, the driver seems fine, but not sure why the DM/compositors are trying to invoke it in EGL mode vs. normal GL…

I did try LXDM instead of SDDM just as a test, and same result, launching into Plasma, Cinnamon, or Mate, so definitely not unique to SDDM. I couldn’t get GDM to work to try.

Here’s the ps aux requested, this is with lxdm trying to launch mate currently (after login, and sitting at a black screen after).

[user@host ~]$ ps aux | grep X
root 93308 1.8 0.0 102276 47736 tty1 Ss+ 13:40 0:01 /usr/lib/Xorg -background none :0 vt01 -nolisten tcp -novtswitch -auth /var/run/lxdm/lxdm-:0.auth
user 93341 0.0 0.0 7104 3384 ? Ss 13:40 0:00 /bin/sh /etc/lxdm/Xsession mate-session
user 93369 0.0 0.0 6132 2364 pts/3 S+ 13:41 0:00 grep X

Note the output of the qdbus txt files and diff them, if you grep on GL, you see where it’s invoking the compositor entirely differently with kwin now, and since no DE’s work. Odd thing is I hadn’t upgraded anything (knowingly) prior, though I find occasionally installing some software with yaourt/yay, it’ll slide in something that breaks. That’s why I just did a system upgrade with pacman -syu, but to no avail after upgrade, or rolling back kernel or driver.

I added a new “not_working” export of the current Xorg.0.log file to the gdrive link sent before, this is with the current 5.2.2 kernel and 430.34 drivers.

Thanks for having a look at this!

The xorg logs are rather old, from september.
Please remove the nomodeset kernel parameter and instead set nvidia-drm.modeset=1

Sorry, probably the last time it’s been rebooted (I run some vm’s on it that I keep going anyways), but still valid from boot.

Ok, I’d added that prior, but I noticed I had nomodeset set before that early on from before, which has worked for years, up to this. I removed nomodeset, rebuilt grub, kernel, and rebooted, and same BSoD logging in again now with that set.

No change basically. Here’s output of my /etc/default/grub if it helps, relevant bits at least. Commented bits are prior revisions.

cat /etc/default/grub

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=“Arch”
GRUB_CMDLINE_LINUX=“cryptdevice=UUID=X:X root=UUID=X”

GRUB_CMDLINE_LINUX_DEFAULT=“quiet nomodeset”

GRUB_CMDLINE_LINUX_DEFAULT=“quiet nomodeset scsi_mod.use_blk_mq=1 intel_iommu=on iommu=pt nvidia-drm.modeset=1”

GRUB_CMDLINE_LINUX_DEFAULT=“quiet scsi_mod.use_blk_mq=1 intel_iommu=on iommu=pt nvidia-drm.modeset=1”

Added a link to the Xorg.0.log file after reboot here: https://drive.google.com/open?id=1nGMEeBhwYoE3amuxQ-LZp8ZPXqkYzY54

Tried this with SDDM again vs. LXDM, but no diff.

Looks normal.
Two things to try:

  • disconnect all but one monitor, reboot
  • try with a newly created user with empty profile

Done and done, same result, black screen after login again.

Anything in journal?
Please run
sudo journalctl -b0 --no-pager >journal.txt
and attach the output file.

Nothing interesting, just KDE complaining it can’t connect to the desktop here I can see from various apps. Presuming because Plasma isn’t starting fully before it hangs.

Here’s the output as requested…

This isn’t unique to KDE, but all DE’s bomb out mostly the same way I’ve observed with a hung desktop.

Looks like a permissions problem. Please post the output of
ls -l /dev/dri /dev/nvi*

Sure, here you go.

[user@host ~]$ ls -l /dev/dri /dev/nvi*
crw-rw-rw- 1 root root 195, 0 Oct 14 14:50 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Oct 14 14:50 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Oct 14 14:50 /dev/nvidia-modeset

/dev/dri:
total 0
drwxr-xr-x 2 root root 80 Oct 14 14:49 by-path
crw-rw----+ 1 root video 226, 0 Oct 14 15:25 card0
crw-rw-rw- 1 root render 226, 128 Oct 14 14:49 renderD128

Please check if your user is member of the video group.

I was not in the video group, only SDDM was, but this seems normal looking at my laptop. I added myself to video, and still no change with sddm or lxdm after login.

Being in the video group would probably only be necessary if running gdm rootless X.
After trying to login, can you access the xserver from vt, running e.g.
DISPLAY=:0 xrandr
and does the ~/.Xauthority file get changed to the contents of the Xserver’s auth file (ps aux |grep X the file after -auth)?

Sure, looks ok, ssh’d from my laptop launching as my normal user. Same as before, one display, no change to permissions after.

$ cat ~/.Xauthority
host0MIT-MAGIC-COOKIE-

$ ls -lah ~/.Xauthority
-rw------- 1 user user 53 Oct 15 09:03 /home/user/.Xauthority

$ DISPLAY=:0 xrandr
Screen 0: minimum 8 x 8, current 3840 x 2160, maximum 32767 x 32767
DVI-D-0 disconnected primary (normal left inverted right x axis y axis)
HDMI-0 disconnected (normal left inverted right x axis y axis)
DP-0 connected 3840x2160+0+0 (normal left inverted right x axis y axis) 1872mm x 1053mm
3840x2160 60.00*+ 59.94 50.00 29.97 25.00 23.98
4096x2160 59.94 50.00 29.97 25.00 24.00 23.98
1920x1080 60.00 59.94 50.00 29.97 25.00 23.98
1680x1050 59.95
1600x900 60.00
1440x900 59.89
1366x768 59.79
1280x1024 75.02 60.02
1280x800 59.81
1280x720 60.00 59.94 50.00
1152x864 75.00
1024x768 75.03 70.07 60.00
800x600 75.00 72.19 60.32
720x576 50.00
720x480 59.94
640x480 75.00 72.81 59.94
DP-1 disconnected (normal left inverted right x axis y axis)
DP-2 disconnected (normal left inverted right x axis y axis)
DP-3 disconnected (normal left inverted right x axis y axis)
DP-4 disconnected (normal left inverted right x axis y axis)
DP-5 disconnected (normal left inverted right x axis y axis)

$ ls -lah ~/.Xauthority
-rw------- 1 user user 53 Oct 15 09:03 /home/user/.Xauthority

Oh, no change to the process ownership either before/after running the local xrandr command.

$ ps aux | grep X
root 15195 1.7 0.0 102232 47832 tty1 Ss+ 22:13 0:00 /usr/lib/Xorg -background none :0 vt01 -nolisten tcp -novtswitch -auth /var/run/lxdm/lxdm-:0.auth
user 15223 0.0 0.0 7104 3508 ? Ss 22:13 0:00 /bin/sh /etc/lxdm/Xsession mate-session

So your user has access to the xserver yet the DE doesn’t start. I’m out of ideas right now, rather recommend a reinstall.