Black or incorrect textures in KDE

Sometimes, black or incorrect textures appear in KDE instead of window contents or other interface elements, like widgets on the panel. One easy way to reproduce the problem is this:

  • Toggle Kwin compositing off and then on again with a hotkey.
  • Have multiple windows opened, from different applications. I typically use Firefox, Thunderbird, QtCreator, Kate and other KDE applications for testing. Preferably, one window should be maximized in the background.
  • Open new windows or resize (preferably increase the size) of windows with mouse. At this point you should see some windows blinking briefly to black. After prolonged use you will notice some windows or regions on the screen stay black instead of just blinking. This happens especially often on resizing windows.

This can be reproduced on a freshly booted system, no suspend or switching to VT.

The case of incorrect textures is harder to reproduce and sometimes appear as some pieces of interface stopping updates unless interacted with. For example, the network bandwidth indicator on the panel stops being redrawn unless the panel is resized or the indicator is interacted with to that it has to be redrawn. I don’t have a step-by-step way to reproduce, it just happens sometimes after a few days of uptime of the system that is being worked on.

This is not a new problem and has appeared for many years on different versions of Nvidia drivers, KDE, Qt and X.org. The manifestation of the problem was different at some points, but the black textures were always present in one way or the other. It was reported to KDE developers multiple times and they refer to Nvidia:

https://bugs.kde.org/show_bug.cgi?id=386752
https://bugs.kde.org/show_bug.cgi?id=354731
https://bugs.kde.org/show_bug.cgi?id=347425

The most recent report (the first link) happens to me on Kubuntu 17.10 x86_64, Nvidia 387.22, GTX980. The step-by-step presented above triggers the problem on this system.

nvidia-bug-report.log.gz (299 KB)

Yeah you are not alone, KDE Kwin with Nvidia is a disaster. All from crashing compositing to crashing rendering in QT in Plasma.

Restarting Plasma is a quick and dirty thing but I noticed that you can run Xrender instead of OpenGL and most if not all issues are gone.

So currently fix is switch to AMD or run Xrender, or if Nvidia fixes their driver but this issue has been going on for years. Its actually the one thing that stopped me from upgrading to 1080TI from 970. And if nothing happens in this department it will be an AMD next gen.

Similar issues;
https://devtalk.nvidia.com/default/topic/1025735/linux/kde-compositing-crash-on-nvidia-drivers-384-90-/
https://devtalk.nvidia.com/default/topic/1026035/linux/nvrm-xid-pci-0000-2a-00-31-ch-00000028-engmask-00000101-intr-10000000-w-kwin-amp-kde/

Xrender means software rasterizer, so it’s not a reasonable long term option.

No one said longterm, atleast this will confirm that its the OpenGL code in Kwin triggering an bug in the Drivers.

Longterm is a fix or AMD GPU with open drivers.

This problem only triggers with Nvidia for me. I have another laptop with Intel GPU and I’ve been using a laptop with AMD GPU for some time - the problem only appeared with Nvidia.

I have no knowledge to localize the problem myself, but it appears to be somewhere in KDE/Qt/Nvidia driver. Switching to the software rendering is not an option because, given the history of this problem, which exists for many years, I will end up permanently using software renderer. This is not why I bought GTX980. I will more likely switch to another WM/DE/GPU.

I don’t have corruption issues with latest KDE Plasma on Arch with Nvidia and GTX 1070 as long as I don’t use standby mode. That X doesn’t correctly refresh with KWin compositing turned off after some time doesn’t seem to be an Nvidia problem, I get this with intel-modesetting too.
But after standby, it’s clearly an Nvidia problem.

KWin actually works best of all compositors with Nvidia regarding performance (well, I didn’t test all the old stuff).
Try this workaround for fixing Vsync OGL compositing instead of other weird hacks:
https://wiki.archlinux.org/index.php/KDE#Screen_tearing_with_Nvidia
Still X/KWin fps drop when opening new windows, but I don’t think this mess can be fixed by anyone alse than Nvidia, not a KDE Plasma issue.

You can also bind

export __GL_YIELD="USLEEP" && kwin_x11 --replace ; pkill plasmashell && plasmashell

to a custom hotkey to workaround corruption (you might have to link to a script because of the environment variable at the first place). It’s certainly not great, no.

@aufkrawall:

  1. The loss/corruption of textures after standby or VT switch is a separate problem, which is covered in https://bugs.kde.org/show_bug.cgi?id=344326 . Reportedly, this should be fixed after KWin developers added support for the Nvidia-specific extension NV_robustness_video_memory_purge. I’m specifically having the problem without VT switch or suspend, and I’m not discussing here the problem with those use cases.

  2. I’m not having the problem with Vsync, this thread is not about Vsync. For the record, I’ve enabled Vsync in nvidia-settings and added the following lines to /etc/environment:

KWIN_TRIPLE_BUFFER=1
KWIN_USE_BUFFER_AGE=0
__GL_SYNC_TO_VBLANK=1
__GL_SYNC_DISPLAY_DEVICE=“DFP-0”

and never had Vsync problems ever since. (The last line indicates the monitor I want to sync with; you may have a different name for your display - do not use these lines verbatim.)

Side note: KWIN_USE_BUFFER_AGE is needed because apparently the buffer age extension (probably GLX_EXT_buffer_age, I don’t know for sure) doesn’t work reliably with Nvidia, which occasionally causes some texture flickering. This happens with KWin and other window managers that use this extension.

Yeah, I didn’t want to hijack your thread with the vsync thing. It’s just that I was suspecting that you are using some specific hacks (like everyone else does, since Nvidia seems entirely incompetent to design their driver to work correctly with any X11 compositor…).
I’m very certain that I don’t suffer any corruption issues (I never saw anything turning black after toggling off compositing.) apart from that standby thing, so I guess it’s not a general problem and perhaps there’s even a connection to your specific vsync hacks.
Enforcing TB btw. leads reproducibly to stuttering here when moving windows.

As far as we are aware, KWin isn’t being tested by its developers on NVIDIA hardware, despite our offers to provide hardware for free. This means that some KWin bugs aren’t caught in time, or at all, and that we never get proper bug reports from KWin developers when they feel that our driver, not KWin, is at fault for a particular problem.

For example, if GLX_EXT_buffer_age is claimed not to work reliably on the NVIDIA driver, we’re eager to hear about that and fix it, but we need a real bug report for that, that exposes the problem.

As for the Xid 31 reports with KWin, we’re investigating, but we’re still looking for a reliable way to reproduce these issues. We haven’t managed to observe the problem locally so far, so any reliable recipe for reproduction we’d like to hear about. It isn’t known at this point if the issue is a NVIDIA or KWin bug.

Thanks

Maybe fix X11 performance in general first?
Example: Take any desktop environment with or without any OGL window compositing. Play a video with any player in a window. Open a new window with any program, e.g. open your texteditor or file browser.
-> video stutters badly during that because Xorg performance drops.

This does only happen with proprietary nvidia drivers, even Nouveau doesn’t show this behavior.
This does also happen with very small window layers like overlay tooltip pop ups in programs etc.
SO ANNOYING!!

@ahuillet:

Thanks for the reply. At least, now I know Nvidia is aware of the problem.

Are you able to reproduce the black textures problem the way I described?

For example, if GLX_EXT_buffer_age is claimed not to work reliably on the NVIDIA driver, we’re eager to hear about that and fix it, but we need a real bug report for that, that exposes the problem.

Yes, I would love this problem to be fixed as well. Unfortunately, I don’t know any reliable way to reproduce the problem, other than use KDE with KWin with my settings above, except KWIN_USE_BUFFER_AGE, on a daily basis. I can also reproduce the occasional texture flickering if I replace KWin with xfwm4+compton, where compton performs compositing with GLX_EXT_buffer_age (in ~/.config/compton.conf, there should be parameter glx-swap-method = “-1”;). The problem is that flickering does not show right away and may appear after an hour or a day or a few days of system use. Do you have any suggestions how I can help with this problem?

For the Xid 31 issues, in my experience launching certain OGL apps, Steam chief amoung them, seems to be the trigger. After a fresh reboot, closing and reopening Steam a few times is usually enough to trigger it. It is caused less often by launching other OGL apps, such as fullscreen games.

My setup:

  • GTX 1080, kwin 5.11.3, nvidia 387.22
  • Disable Vsync in nvidia-settings, enable allow flipping. G-Sync enabled, but multiple monitors in use, so effectively disabled.
  • Set KDE compositing to "Automatic" tear prevention, OpenGL 3.1
  • KWIN_TRIPLE_BUFFER=1 kwin_x11 --replace
  • Bind a key in global shortcuts -> custom shortcuts to the above (KWIN_TRIPLE_BUFFER=1 kwin_x11 --replace) if you value your sanity, as every time this happens the display stops updating until kwin is relaunched. VT switching to get a console to kill kwin is risky - after doing this a few times vsync fails (like, forever, until a reboot(?)) and framerate drops everywhere. Additionally VT switching seems to give a ~10% chance that the driver never comes back.

At this point, launch steam, login, exit steam, repeat a few times. I’ve never survived on a fresh reboot across two different systems (sharing in that they have a 1080) longer than a few launches of steam. It seems to get worse as time goes on, until any opengl app starting up will trigger the error, making above keybind a necessity.

(While we’re on the subject, kwin claims vsync doesn’t function correctly unless you set __GL_YIELD=usleep or use their triple buffering under the nvidia drivers, no idea if it is relevant to this issue)

BTW, I’ve found the KDE bug that recommended the KWIN_USE_BUFFER_AGE workaround:

https://bugs.kde.org/show_bug.cgi?id=363500

It also contains a video showing the problem. I’m posting it here just in case someone is able to salvage any technical hints from it (or the bug it is linked as a duplicate of).

The claim that __GL_YIELD=usleep is required points at an application bug, possibly a race condition due to missing synchronization.

I’ve tried both Nephyrin’s and Lastique’s instructions. On a Archlinux updated today (Kwin 5.11.3), I’ve opened many different windows, one maximized in the background, and resized windows without seeing any black flicker or fully black window.
I’ve started Steam and switched between tabs, then restarted Steam about ten times, without observing either a black window or a Xid error.

NVIDIA driver 387.22 on Geforce GTX 770.

@ahuillet:

I tried building Kwin 5.11.3 from sources on my Kubuntu 17.10 and can still reproduce the black textures. Did you configure the environment variables as I described?

I didn’t think any environment variable was needed to reproduce the problem, only to solve a separate issue related to VSync?

The environment variables work around multiple issues, one of them is Vsync. I don’t know the source of the problem, so I cannot tell if any of the variables are essential. But I can tell that I can reproduce the problem with these variables, so for now I consider all of them essential. You may need to also set up xorg.conf. I have only /etc/X11/xorg.conf.d/20-nvidia.conf with the following content:

Section "Device"
    Identifier "Default nvidia Device"
    Driver "nvidia"
    Option "NoLogo" "True"
    Option "CoolBits" "12"
    Option "TripleBuffer" "True"
EndSection

FYI It’s not just Kwin that suffers from the buffer_age issue. Enlightenment does too. We’ve used buffer age on both EGL and GLX drivers for a long time and at least the commercial EGL drivers for ARM systems (Mali, Imgtec) that supported this worked fine. It’s the exact same rectangle update/history tracking logic for both paths, but I notice that sometimes we get an “old frame that is different to what buffer age says it is” and this results in some parts of the screen going into flicker-fest every time any update happens until that area of the screen is redrawn. Indeed the workaround is to force “full updates” in the settings panel.

Trying to figure out if it is the driver or not is really hard because we basically would have to keep a history log of the last N frames of backbuffers (literally read all the pixels and store them) before and after render, and if you “see the bug” then dump them all out and hope history went back far enough. That means N needs to be non-trivial (like maybe 100-500 buffers so we could go back a few seconds in time to the issue if we dump via some hotkey). It’s kind of a nasty thing to run all day on your desktop just to catch when this happens maybe 3-5 times per day on average.

If the nvidia driver had more probes and ways of digging into its logic… I’d be doing just that to maybe if source existed, add things like frame display counts to each back buffer (some monotonic increasing integer for each frame swapped) and then have some logic to check buffer age matches this counter correctly or something. Cheaper than entire 5120x1440 buffer reads 2x per frame… :)

If nvidia have some way to help get more info like this out of a driver (perhaps have an engineering/debug build/version) it’d be really helpful. The fact both Kwin and Enlightenment suffer indicates the bug MIGHT be in the driver. The logic for buffer_age is actually in EFL so also applications would get affected if the compositor were not redirecting - this is the case on mobile and TV etc. environments where apps spend most of their life undirected, and the same buffer_age logic works as above.

Well, i’ve had texture corruption in civilization 5 after switching to a text vt and back to the xorg one.
I’ve had the same with portal and half life 2.
If the issue is a kwin fault, then it is a Civilization 5 fault and source engine fault, and god only knows how many could be added to the list.

Or maybe it is just nvidia, who knows (meh).

xrender is gpu accelerated since 10 years:
https://www.phoronix.com/scan.php?page=article&item=934&num=2

And paired with forcefullcompositingpipeline and some quirks in kwinrc (MaxFPS=60 or whatever), is a nice alternative to opengl compositing.
But it just offers translucency, so effects like magic lamp, blur, cube, wobbly will not work.