I actually wondered about the second half of my theory in the next paragraphs.
Looking at DXVK, it seems that it can allocate Vulkan memory from different pool. One pool is “device local”. The question is: What does this actually mean? Is it memory to be preferred in video memory, or is it pinned to video memory? If it is pinned, applications need to be aware of the fact that an allocation can fail because that pool cannot be overcommitted.
And if desktop environments render with Vulkan, maybe they should not put everything in the “device local priority”. After all: It’s just window surface rendering, maybe with some fancy shader effects. I’d even say: most of this would be allowed to be CPU rendered, without any reasonable performance cost.
At least KDE Plasma renders with OpenGL - and that doesn’t have this kind of priority as far as I know. That’s why kcmshell6 qtquicksettings helped me a lot: It forces QtQuick components to be rendered with the CPU if I set it to “software rendering” - thus memory can be allocated with wl_shm as far as I understand, and the driver is allowed to swap the buffers back and forth as needed. It also allows “Vulkan rendering” but that quickly becomes a memory budget bottleneck again - maybe, because surfaces are allocated from the device local pool?
Everything beyond this becomes advanced virtualization of memory - and that’s probably where the NVIDIA driver historically has non-existing or inefficient code paths. With Xorg, there were tricks were NVIDIA had shadowed buffers (which required newer versions of glibc), I don’t think this exists in wayland.
That said, even if some people reported better results with Xorg, I switched to Wayland because it worked much better (once it became stable enough to work with). With Xorg, I had severe slowdowns when video memory became full and bus usage spiked. The NVIDIA driver seemingly cannot swap memory back and forth as needed: If it has been allocated from video memory (what GL probably does), then it stays there. If it has been allocated from system memory, then it stays there, too. This results in performance tanking once you hit the bottlebeck without recovering unless you manage to free all memory (by closing all applications). With Wayland, this works much better.
From a non-NVIDIA perspective, I’m also running a system with Intel iGPU. During the early days of Plasma Wayland, I had a lot of crashes, black screens, or windows would simply not update any longer - unless I closed all windows or restarted the session. This has been fixed since, and I don’t think this has been fixed because “Intel started to support shared memory” - it always supported that (it’s an integrated GPU, it always worked that way). It has been fixed, because Plasma and kwin_wayland have been fixed to properly handle memory budgets, and allocation failures. But I think there are still opportunities to handle that better, which would benefit the current situation with NVIDIA.
I’m not saying that NVIDIA does handle memory perfectly - it clearly doesn’t. But I think there’s still a lot of homework to be done on the other side, too. And that’s not just deniable because “it works with AMD”. I think, it doesn’t, if memory pressure becomes high enough. I think I could even still crash my Intel iGPU desktop if I could get video memory pressure high enough.
There are just so many moving parts, knobs to turn, code paths to interact, that it is highly difficult to coordinate everything. The NVIDIA driver still supports Xorg as a first class citizen, and that probably binds a lot of old legacy code which cannot be easily transformed to work with modern memory management as needed by Wayland. And there’s CUDA which is probably number one priority because “compute clusters”.
What I want to say: The NVIDIA driver will need more time to optimize this, and it can support “shared memory”. But I don’t think that the opportunity is just solely on the side of NVIDIA to improve this. Wayland compositors and clients can still improve, too. And they can probably do that easier and faster, which would give us NVIDIA users some more air to breathe. But NVIDIA really needs to act on this, maybe get involved in wayland projects more, or prioritize the open source driver involvement. I won’t wait on an improvement forever. I can see progress in baby steps, this keeps my hope alive, but if I needed to buy a new GPU today, it probably won’t be color green.
For now, I’m waiting for the DX12 shader improvements to find their way into the Vulkan specs, then picked up by vkd3d and DXVK. I think we’re going to get a huge performance boost from that for some specific scenarios (even on other GPUs, just not that “huge”) which may sometimes look like memory pressure but really isn’t. Memory hasn’t been a major issue for me for some time now, games usually can handle their budgets well now. There are a few exceptions, like Elite Dangerous, which is notoriously bad with handling the video memory budget and flips the GPU into turtle mode once it overshoots. But I can work around that by limiting what DXVK reports as free memory. Usually, I need to limit exactly by that amount of memory that KDE Plasma (or rather the desktop) is using - including that I keep runaway memory under control with QtQuick software rendering, so it won’t increase while running games.
But that brings me back to the homework part: KDE Plasma still has runaway video memory allocation issues - and that probably needs to be fixed by the KDE devs. It looks like niri may have a similar problem, according to one report here. To me, those issues look like devs rely on the GPU driver to move memory away, just pushing the problem aside (“we just allocate, the driver can store that memory somewhere”), instead of trying to stay within reasonable budgets.
I don’t want to say with this that KDE Plasma or others have memory leaks, and are too sloppy to deallocate memory - they probably track that properly in most cases. But they also don’t think a lot about budgets. And there are still issues with video memory not being fully freed when closing applications, so the driver probably also leaks memory.
But in essence: There’s homework to do for both sides…