Non-existent shared VRAM on NVIDIA Linux drivers

There’s definitely something “funky” with Wayland.

I mean, I don’t want to go off topic from this thread but it’s related in the sense that it exposes so many moving parts end users need to be aware of to get things to maybe work.

With niri (Wayland) I launched Silksong the video game from Stream, straight up. No gamescope and no Proton. It runs worse than it does on Windows with the same hardware, it has a lot of micro-stutters on Linux and just doesn’t feel smooth with my hardware despite it reporting 60 FPS.

However, the real problem is there’s a constant 150-200ms keyboard input delay but the mouse is fine. This keyboard delay is a show stopper. It’s only happening in games too.

I spent 4 hours debugging this trying every combination of every NVIDIA env var you can imagine. I tried gamescope but gamescope wouldn’t even run, it crashed. I tried a ton of stuff. Then I tried Proton and it made no difference, the keyboard delay was there.

Then I was like screw this, I’ve put a week into trying to get a stable system and I was ready to go back to Windows 10 so I tried KDE Plasma as a last effort. With KDE Plasma using Wayland, the keyboard delay went away. So something about niri handles this differently even though they both use Wayland. I am pretty sure Plasma is using XWayland just like niri too.

Then I tried KDE Plasma X11 and there was also no keyboard input delay, but the frame rate felt worse than with Wayland despite reporting 60 FPS. Now there was visual tearing and I tried v-sync on / off with no difference. Also gamescope no longer crashed but it played at 15 FPS instead of 60 FPS without it.

You know what the experience is like on Windows? Just install the OS, boot up and open whatever you want without issues and play any game your hardware will physically be able to run at an acceptable frame rate.

I’m not saying to do that, but if we want Linux on the desktop to succeed in the end, I think there’s still a long ways to go.

Not sure if this is the same problem we talk about. I do not think this is related to x11 or wayland. This is nvidia driver related, at least in my case.

Squad:
I normally play in Gnome wayland, cachyOS. I used to play with x11 before, the experience was inferior. But I read some of your thoughts that x11 behaves correctly so I downloaded xfce and xorg session and tested Squad there.

Squad issues:
Vram gets filled to 8gb most times. This causes system freezing: frame holding, until something happens that frees the memory or something else. Most times it can revive, as before it used to be an issue that squad started stuttering into 20fps because vram filled and nothing frees it nothing can get it back. This issue is not there anymore on wayland gnome at least, the game just freezes when the vram gets full, but it goes on when it lowers.

So my test with XFCE x11 made me realize that issue still exists in that mode. Squad started stuttering into the 10-20 fps and vram showed 8gb on mangohud. Nothing helped. Before that, the freezes were the same as with wayland.

So wayland is superior, and at least THIS vram issue is completely an nvidia driver issue.

(Of course I have to add that windows works fine no issues)

What everyone is circling around is VRAM exhaustion behavior on NVIDIA’s Linux driver. It’s not a lacking feature (AMD, Intel), it is a malfunction.

AI says something about “host-visible memory mapped through GPU page tables” I know nothing about.

Last edit hopefully :D Just to add that Squad is not the only game I’ve observed this behavior in. It is just a game where this issue is abundant.

I’m just going to drop this AI “slop” here, because I did not see any NVIDIA modes talking about it like this:

The Technical Reality

What “shared memory” or “system RAM fallback” actually means:

When VRAM fills up, the GPU doesn’t just magically use system RAM. Instead:

  1. Host-visible memory (system RAM) is mapped into the GPU’s address space via GPU page tables (similar to how CPU virtual memory works)
  2. The GPU can then access this memory, though with significantly higher latency than VRAM
  3. This is what AMD calls the GTT (Graphics Translation Table) domain and Intel calls PPGTT (Per-Process Graphics Translation Table)

NVIDIA’s Implementation Problem

The issue isn’t that NVIDIA can’t do this—they clearly can on Windows. The problem is how (or if) they’re implementing it on Linux:

Possible failure modes:

  1. Not mapping system RAM through GPU page tables at all when VRAM exhausts
  2. Mapping it but with broken page table management (causing hangs/stutters)
  3. Failing to migrate/evict existing VRAM allocations to make room for new ones
  4. Different behavior between display servers due to how memory is requested through different APIs (DRM/GBM for Wayland vs older X11 paths)

Why It Matters

When the driver fails to properly map host memory:

  • Allocation requests fail hard → applications crash
  • GPU stalls waiting for memory → freezing/stuttering
  • No graceful degradation → performance cliff instead of smooth slowdown

Compare this to AMD/Intel on Linux or NVIDIA on Windows, where the performance degrades smoothly as the GPU accesses slower system RAM, but nothing crashes.

The perryman337 X11 Test

The fact that perryman337 still experienced issues on X11 suggests the problem might be deeper than just Wayland’s buffer management. It could be:

  • NVIDIA’s fundamental page table management on Linux is broken
  • The driver isn’t properly evicting old allocations from VRAM
  • There’s no proper memory overcommit mechanism like AMD has

What NVIDIA Needs to Fix

At the driver level, they need to:

  1. Properly implement GPU page table mapping for system RAM when VRAM exhausts
  2. Handle memory pressure by evicting least-recently-used allocations from VRAM
  3. Allow memory overcommitment with transparent migration between VRAM and system RAM
  4. Return host-visible memory when device-local fails instead of rejecting allocations

This is all proven technology that works on their Windows driver and on AMD/Intel Linux drivers. There’s no technical reason it can’t work on NVIDIA Linux drivers except that they apparently haven’t prioritized fixing it.

1 Like

I’d say the first half of the theory is bang on correct, but the second half is open to interpretation.

[Also, this took me ages to write and edit, no AI use, so inaccuracies are my own!]

Modern VRAM management on modern GPUs is fully virtualised, so as far as I am aware, software can opt to explicitly use system RAM but can’t truly dictate when asking for VRAM that only dedicated GPU VRAM is used, provided that system RAM is available via the driver to be used as GPU shared memory. Applications can advise what to prioritise keeping in dedicated VRAM under pressure (VK_EXT_memory_priority) as well as check on resource availability (VK_EXT_memory_budget) but there’s no guarantee that actual dedicated VRAM will be used. Ironically, Windows users love to whine about this lack of guaranteed control on the NVIDIA forums (if only they knew how bad the alternative was!)

Most basic applications will fall over when additional resources can’t actually be used, but on modern systems, that’s not likely to be an issue when the OS can autonomously demand page Dedicated GPU Memory contents into a large potential pool of Shared GPU Memory (Windows WDDM allowing for up to 50% of available physical system RAM to be grabbed autonomously, and I think Linux AMDGPU allows for up to 75%) in a way that’s also natively memory managed from a CPU perspective (i.e. normal CPU-only RAM can also be compressed or paged out to disk to make room).

It’s like magic when it works, and the NVIDIA drivers on Windows (when using WDDM) definitely show off just how far one can go with very little VRAM, as practically everything can be rapidly evicted to make room for what you’re actually interacting with in an intelligent way without applications needing to be that intelligent in the first place. We just need NVIDIA to make this happen for Linux too.

I actually wondered about the second half of my theory in the next paragraphs.

Looking at DXVK, it seems that it can allocate Vulkan memory from different pool. One pool is “device local”. The question is: What does this actually mean? Is it memory to be preferred in video memory, or is it pinned to video memory? If it is pinned, applications need to be aware of the fact that an allocation can fail because that pool cannot be overcommitted.

And if desktop environments render with Vulkan, maybe they should not put everything in the “device local priority”. After all: It’s just window surface rendering, maybe with some fancy shader effects. I’d even say: most of this would be allowed to be CPU rendered, without any reasonable performance cost.

At least KDE Plasma renders with OpenGL - and that doesn’t have this kind of priority as far as I know. That’s why kcmshell6 qtquicksettings helped me a lot: It forces QtQuick components to be rendered with the CPU if I set it to “software rendering” - thus memory can be allocated with wl_shm as far as I understand, and the driver is allowed to swap the buffers back and forth as needed. It also allows “Vulkan rendering” but that quickly becomes a memory budget bottleneck again - maybe, because surfaces are allocated from the device local pool?

Everything beyond this becomes advanced virtualization of memory - and that’s probably where the NVIDIA driver historically has non-existing or inefficient code paths. With Xorg, there were tricks were NVIDIA had shadowed buffers (which required newer versions of glibc), I don’t think this exists in wayland.

That said, even if some people reported better results with Xorg, I switched to Wayland because it worked much better (once it became stable enough to work with). With Xorg, I had severe slowdowns when video memory became full and bus usage spiked. The NVIDIA driver seemingly cannot swap memory back and forth as needed: If it has been allocated from video memory (what GL probably does), then it stays there. If it has been allocated from system memory, then it stays there, too. This results in performance tanking once you hit the bottlebeck without recovering unless you manage to free all memory (by closing all applications). With Wayland, this works much better.

From a non-NVIDIA perspective, I’m also running a system with Intel iGPU. During the early days of Plasma Wayland, I had a lot of crashes, black screens, or windows would simply not update any longer - unless I closed all windows or restarted the session. This has been fixed since, and I don’t think this has been fixed because “Intel started to support shared memory” - it always supported that (it’s an integrated GPU, it always worked that way). It has been fixed, because Plasma and kwin_wayland have been fixed to properly handle memory budgets, and allocation failures. But I think there are still opportunities to handle that better, which would benefit the current situation with NVIDIA.

I’m not saying that NVIDIA does handle memory perfectly - it clearly doesn’t. But I think there’s still a lot of homework to be done on the other side, too. And that’s not just deniable because “it works with AMD”. I think, it doesn’t, if memory pressure becomes high enough. I think I could even still crash my Intel iGPU desktop if I could get video memory pressure high enough.

There are just so many moving parts, knobs to turn, code paths to interact, that it is highly difficult to coordinate everything. The NVIDIA driver still supports Xorg as a first class citizen, and that probably binds a lot of old legacy code which cannot be easily transformed to work with modern memory management as needed by Wayland. And there’s CUDA which is probably number one priority because “compute clusters”.

What I want to say: The NVIDIA driver will need more time to optimize this, and it can support “shared memory”. But I don’t think that the opportunity is just solely on the side of NVIDIA to improve this. Wayland compositors and clients can still improve, too. And they can probably do that easier and faster, which would give us NVIDIA users some more air to breathe. But NVIDIA really needs to act on this, maybe get involved in wayland projects more, or prioritize the open source driver involvement. I won’t wait on an improvement forever. I can see progress in baby steps, this keeps my hope alive, but if I needed to buy a new GPU today, it probably won’t be color green.

For now, I’m waiting for the DX12 shader improvements to find their way into the Vulkan specs, then picked up by vkd3d and DXVK. I think we’re going to get a huge performance boost from that for some specific scenarios (even on other GPUs, just not that “huge”) which may sometimes look like memory pressure but really isn’t. Memory hasn’t been a major issue for me for some time now, games usually can handle their budgets well now. There are a few exceptions, like Elite Dangerous, which is notoriously bad with handling the video memory budget and flips the GPU into turtle mode once it overshoots. But I can work around that by limiting what DXVK reports as free memory. Usually, I need to limit exactly by that amount of memory that KDE Plasma (or rather the desktop) is using - including that I keep runaway memory under control with QtQuick software rendering, so it won’t increase while running games.

But that brings me back to the homework part: KDE Plasma still has runaway video memory allocation issues - and that probably needs to be fixed by the KDE devs. It looks like niri may have a similar problem, according to one report here. To me, those issues look like devs rely on the GPU driver to move memory away, just pushing the problem aside (“we just allocate, the driver can store that memory somewhere”), instead of trying to stay within reasonable budgets.

I don’t want to say with this that KDE Plasma or others have memory leaks, and are too sloppy to deallocate memory - they probably track that properly in most cases. But they also don’t think a lot about budgets. And there are still issues with video memory not being fully freed when closing applications, so the driver probably also leaks memory.

But in essence: There’s homework to do for both sides…

Yep, it does. I noticed GPU memory climbing over time while naturally using my system. Given my 750 Ti only has 2 GB, it means rebooting every few hours.

I just worry things won’t be addressed at the NVIDIA level in time because the 580 series drivers only have half a year of time left before they stop receiving updates.

With a problem that might have a lot of moving parts, expecting a solution in that time frame is pretty unlikely. Hopefully NVIDIA understands this is a real problem for anyone without a higher end video card and will apply the fix even after the deadline.

I’ve went ahead and opened a bug report here System memory is never used for GPU processes in Wayland causing crashes, but it works fine in X11 · Issue #185 · NVIDIA/egl-wayland · GitHub. I hope that’s the correct repo.

KWin is definitely wasteful with VRAM, as last I checked, it doesn’t track window occlusion which would allow applications to completely skip rendering new frames when their windows are fully covered, as well as allowing applications to trim back on actual VRAM use.

I have the sneaking suspicion NVIDIA won’t be fixing their drivers any time soon but would very happy if they did, as even with the D3D12→Vulkan overhead in the interim, I’d still be willing to take the hit, enjoying the “fine wine” later down the line.

I could be very wrong in my interpretation of how the Linux NVIDIA kernel DRM drivers currently work as I’ve only taken a brief glance, but it seems like they take a black box approach, using the Linux GEM API as a (very necessary) abstraction layer for userspace compatibility, which behind the scenes is making calls to their own NvKms API, which behind the scenes is then making calls to the GSP firmware. If an allocation can’t be made through NvKms, then that GEM error we’re all seeing shows up.

If the NVIDIA kernel DRM driver had proper support for handling system memory, it would at least be able to keep newly launched applications rendering, even if extremely slowly, by handing out GEM objects backed by system RAM instead of a proper device-local VRAM allocation supplied by the GSP firmware through NvKms.

I reckon this will all get sorted through the development of Nova, with NVIDIA’s engineers designing it to work with system memory right off the bat.

Okay, so I did some research, too. One brute force method could be to patch the open kernel drivers in the following way (not tested!):

diff --git a/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c b/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
index 6b92c753..5cdcf594 100644
--- a/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
+++ b/kernel-open/nvidia-drm/nvidia-drm-gem-nvkms-memory.c
@@ -367,10 +367,25 @@ int nv_drm_dumb_create(
     allocParams.type = NVKMS_KAPI_ALLOCATION_TYPE_SCANOUT;
     allocParams.size = args->size;
     allocParams.noDisplayCaching = true;
-    allocParams.useVideoMemory = nv_dev->hasVideoMemory;
     allocParams.compressible = &compressible;

+    // First attempt: try to allocate in video memory if available and requested
+    NvBool originalUseVideoMemory = nv_dev->hasVideoMemory;
+    allocParams.useVideoMemory = originalUseVideoMemory;
     pMemory = nvKms->allocateMemory(nv_dev->pDevice, &allocParams);
+
+    if (pMemory == NULL && originalUseVideoMemory) {
+        NV_DRM_DEV_LOG_INFO(
+            nv_dev,
+            "VRAM allocation failed for dumb object of size %" NvU64_fmtu ", "
+            "attempting system memory fallback.",
+            args->size);
+
+        // Fallback attempt: try to allocate in system memory
+        allocParams.useVideoMemory = NV_FALSE;
+        pMemory = nvKms->allocateMemory(nv_dev->pDevice, &allocParams);
+    }
+
     if (pMemory == NULL) {
         ret = -ENOMEM;
         NV_DRM_DEV_LOG_ERR(
@@ -541,12 +556,27 @@ int nv_drm_gem_alloc_nvkms_memory_ioctl(struct drm_device *dev,
     allocParams.type = (p->flags & NV_GEM_ALLOC_NO_SCANOUT) ?
         NVKMS_KAPI_ALLOCATION_TYPE_OFFSCREEN : NVKMS_KAPI_ALLOCATION_TYPE_SCANOUT;
     allocParams.size = p->memory_size;
-    allocParams.useVideoMemory = nv_dev->hasVideoMemory;
     allocParams.compressible = &p->compressible;

+    // First attempt: try to allocate in video memory if available and requested
+    NvBool originalUseVideoMemory = nv_dev->hasVideoMemory;
+    allocParams.useVideoMemory = originalUseVideoMemory;
     pMemory = nvKms->allocateMemory(nv_dev->pDevice, &allocParams);
+
+    if (pMemory == NULL && originalUseVideoMemory) {
+        NV_DRM_DEV_LOG_INFO(
+            nv_dev,
+            "VRAM allocation failed for GEM object of size %" NvU64_fmtu ", "
+            "attempting system memory fallback.",
+            p->memory_size);
+
+        // Fallback attempt: try to allocate in system memory
+        allocParams.useVideoMemory = NV_FALSE;
+        pMemory = nvKms->allocateMemory(nv_dev->pDevice, &allocParams);
+    }
+
     if (pMemory == NULL) {
-        ret = -EINVAL;
+        ret = -EINVAL; // Or -ENOMEM depending on typical failure code
         NV_DRM_DEV_LOG_ERR(nv_dev,
                            "Failed to allocate NVKMS memory for GEM object");
         goto nvkms_alloc_memory_failed;

But this will most likely result in abysmal bad performance for applications that request a device-local render surface but get back a host-memory allocation. I think the purpose of the flag is to explicitly get what you ask for, while the absence of the flag allows the driver to use or migrate memory how it thinks it would be optimal.

So at this point, the logic clearly implies: “If you ask for VRAM, then you get VRAM, or you don’t. Deal with it.”

That also means, looking at the example of niri, that those wayland compositors always ask for device-local memory if VRAM becomes a bottleneck and cannot be migrated by the driver.

So let’s analyze niri a bit further - I’ve used AI for this hypothesis but drew the conclusions myself:

  • Niri uses smithay as its GBM allocator.
  • The interface doesn’t provide a way for niri to say “I want device-local memory” or “I don’t care”.
  • smithay’s GBM allocator uses flags like GBM_BO_USE_RENDERING, which signals the underlying driver (Mesa/NVIDIA) to provide a buffer optimized for GPU rendering. For performance reasons, the driver will always try to place such buffers in VRAM (device-local). This is great for performance but becomes a problem when VRAM is exhausted.
  • So essentially, niri only ever asks for device-local memory, not because it intents to do that, but because smithay is designed to do it, without fallback logic.
  • The NVIDIA driver is acting correct here: It denies the request for device-local memory because it cannot allocate it.
  • If smithay had a strategy to request buffers without GBM_BO_USE_RENDERING (e.g., as simple linear buffers), the driver would likely place them in system memory. However, these buffers might not be usable for zero-copy GPU rendering, which would kill compositor performance. The core issue is the lack of a “prefer VRAM, but allow system memory” hint in the API and the application’s logic.
  • Thus, smithay doesn’t have a fallback logic (and so does niri).
  • And seemingly, niri always expects memory allocations to succeed (which is bad in itself).
  • This still points to a weakness in the driver’s memory management. Even if an application rigidly requests device-local memory, an ideal driver should be able to make space by evicting other, less-used DEVICE_LOCAL resources to system memory in the background. The fact that it instead fails the allocation suggests this on-demand eviction isn’t working as robustly as it could. That is, unless all of the VRAM is allocated with DEVICE_LOCAL and the driver intention is to keep that memory there - then there is no chance of eviction. And apparently, niri does exactly that.

I’m pretty sure that other Wayland compositors have a similar, non-optimal behavior.

1 Like

Does this indicate it’s a problem with all NVIDIA GPUs, not just older cards and maybe most folks with higher end cards don’t encounter it because you have so much memory it’s not a problem? This would be especially so if folks turn their computers off daily so GPU memory leaks get reset.

I wonder what X11 does differently where it’s not a problem.

Thanks for looking into this, especially in the context of niri / smithay.

Yes, I think, even if the driver did migrate some memory allocations (it’s still unsure if it can do it, or under what conditions), the situation is much worse for your setup because the small VRAM size potentially contains - relatively speaking - a lot more active, “pinned” memory (which cannot be migrated) than on systems with more VRAM, because niri and Firefox seem to request it with the device-local flag set - maybe due to inefficiencies in other libs, so it’s most likely not even their intent to do that. Running such high resolutions, on multiple monitors even, doesn’t really help it because back buffers and window surfaces are potentially much bigger.

But this shows that the problem is more complex than just pointing the finger at NVIDIA just because “it works” on other GPUs.

Software should be designed around the idea to be as resource efficient as possible - and my finding suggest that this isn’t the case, no matter the capabilities of the NVIDIA drivers (which are clearly incomplete).

KDE Plasma seems (just a wild guess) to have a similar issue where each additional background image (per monitor multiplied by the number of images in a slideshow) seems to be allocated from device-local video memory. This isn’t efficient, it’s lazy in the worst case. And it makes it harder to use old, dated hardware. Especially on NVIDIA, that’s a problem because the driver won’t “swap this memory away”. It should live in system memory in the first place - it’s just an image. It probably shouldn’t even be loaded if it’s not displayed (slideshow).

It’s interesting you say that because I notice a lot of stuttering and jitters when typing into Ghostty (terminal) when I have a full screen terminal on a 1:1 scaled 4k display with ~120 lines of vertical text within Neovim using syntax highlighting.

It’s so much slower and less smooth than it was on Windows with the Microsoft Terminal.

Ghostty will also sometimes fully freeze the output if I hold a key down and it only updates the screen when I release the key. This is much more frequent if I have a few split Vim buffers open displaying even more text. I like a fast repeat rate (40) with low repeat delay (200).

Turning off syntax highlighting helps but I don’t want to give that up or have to manage that, it was fine with the same GPU on Windows.

This only happens when the window is getting modified (adding or removing characters). If I just move the cursor around there is no perceivable jitters or slowness.

Based on your findings here, would this problem not exist on AMD? I ask for 2 reasons, first it’s good to know just as a consumer but second, maybe there’s code they have open sourced that could be used as inspiration as a fix or are their internal APIs so drastically different it wouldn’t apply?

I think the APIs are very different. Most of the NVIDIA RM (resource manager) seems to live in the GSP (for the open module), or it’s closed source (for the proprietary driver). From the perspective of the problems we observe here, the design philosophy of resource management seems to be very different with NVIDIA, and it’s baked deeply into the driver and GSP. It explains why this can’t be easily fixed.

That said, if your 2 GB card would be an AMD card, you probably wouldn’t face exactly these problems now. Whether the performance would be better overall, I don’t know.

I also think the NVIDIA devs very well know where to fix the problems - at least on their side. That’s more likely a problem of strategic focus: For Linux, NVIDIA prioritizes data center usage, not gaming usage. Even if the devs want and could fix the issues, there are most likely not enough dev resources allocated for a quick fix. And it’s more complex than just the drivers - devs need a good overview about how all the different display compositors work, and why they do certain things in a certain way, and only then optimize the drivers.

But that’s only one part of the solution. User-space has to do their homework, too, and not be wasteful with resources. That will benefit all GPU vendors. Similar to how Vulkan is in the process of adding a new shader spec to support better what DX12 does: It’s exceptionally bad on NVIDIA due to design decisions, but it’s neither all perfect for other GPU vendors - all suffer an inefficiency from the design decisions made in the Vulkan specs. That’s just another example of others having to do their homework, too.

I’m looking at this in a similar light: There have been design decisions in how Wayland handles memory in the kernel drivers (and which interface the kernel provides: GBM, GEM, KMS, …). And that is seemingly exceptionally bad for NVIDIA, it works okay for other vendors. But are those design decisions - most likely made 10+ years ago - really the best design for modern desktops and gaming? No matter the answer, NVIDIA needs to adjust now - better sooner than later.

Please, NVIDIA, focus more resources on this.

4 Likes

It’ll probably be fixed by 2030, just have some patience and faith.

2 Likes

I think it does! Though IMO, it would be best to have a separate bug report thread for Spider-man vs combining with Doom Eternal, and make it more clear what Linux environment was used etc. Also, I have a feeling that NVIDIA would prioritize investigating issues that affect a typical Ubuntu setup, and the less common/more custom the setup, the less likely they are to prioritize investigating it.

In that thread, someone in the linked GitHub issue mentioned that Doom Eternal has a VRAM leak/crash even on a 12 GB GPU… So I tested it out of curiosity, but it ran flawlessly for me, on a default Fedora (GNOME) setup.

I think the best thing we can do as end-users to contribute to improve the whole situation is to provide high quality reports.

Just wanted to say I appreciate the thoughtful analysis and systematic look at the issue. :) I think this is the kind of inquisitive mindset we should have.
It would be interesting to know what smithay’s authors think of this design/issue…

1 Like

Well, I cannot just make some statement about my anger without looking behind the curtain. And behind the curtain, there seem to be a lot of issues that are not particularly optimal - not just in the NVIDIA driver.

This doesn’t make the situation better for NVIDIA, but the context is a lot bigger than just this driver.

1 Like

Yes and since a few days ago, I discovered an amplifier of this problem.

niri’s underlying compositor (smithay) has a confirmed memory leak where every time you close a window, the compositor will not reclaim memory so it continues to climb. It affects COSMIC too. It’s a problem for both NVIDIA and AMD users.

Here’s both GitHub issues:

I posted in the smithay one with reproducible steps showing how it affects pretty much every way an app can run and be stored in memory within Wayland.

Unfortunately this issue has been going on for at least a year unaddressed. Most people reboot daily so they don’t notice it. AMD users are less likely to notice because if their GPU memory fills up, system memory will be used.

It applies here because if you have a lower end NVIDIA GPU, just using your machine normally for half a day with niri or COSMIC (or anything that uses smithay) will result in your GPU memory being pressured so much that you’ll have to reboot.

We (kinda) know the answer to that one. The TTM API is what major in-kernel discrete GPU drivers use and it still appears to lack equivalent vendor neutral memory pressure, allocation tracking and residency prioritisation features Windows added to WDDM for D3D12 as early as 2017. These features have become pretty essential to handling both modern video games and mixed compute (including personal AI) workloads on general purpose desktop PCs.

So what we can glean from this is even if NVIDIA did everything the other in-kernel drivers did using standard APIs today, we still wouldn’t automatically end up with the memory management issues resolved, as they’re not completely resolved for their rivals either. But the end result would still be far better than what we have today.

With all that said, I have a feeling now that some very interested stakeholders contributing to TTM will have all of this (and many other bits) sorted very soon since said stakeholders have a dGPU with limited dedicated VRAM in mind for their latest Linux-based games console.

Whoever catches up to the status quo first is going to end up winning a loyal userbase who probably won’t change their mind once things “just work”

1 Like

Thanks, this was very insightful.

This RFC is from 2024. I’m not sure when it was merged. But since the NVIDIA driver is out of tree and has to support LTS distributions, it probably means we can get proper support of it in 2032, maybe later, because relevant LTS distributions ship kernels that are 7+ years old.

I think NVIDIA should really work on a long support branch which supports older kernels with bug fixes, and a modern branch which adapts the current technologies to get rid of this uncomfortable situation.

Exactly because of that. NVIDIA won’t be winning that race - not even close - at their current pace. I’m already not depending on NVenc because it just doesn’t work for me under Wayland (error 999 or something which means more or less “unknown error”), Intel QSV does a very good replacement job at no CPU overhead, and even frees GPU resources, with only slightly less encoding quality. And most features where NVIDIA once excelled, now have almost equally performing rivals (DLSS, raytracing, FG).

So as an alternative suggestion, NVIDIA should start getting involved in Linux gaming handheld SoCs/APUs. It’s probably the best push for driver improvements we could get.

I’m probably only going to wait no longer than this year. The costs of upgrading are currently just too high.

I have many similar experiences to this when comparing directly with my experience on windows

I don’t know much about the technical details like a lot of people here but I’m experiencing this across multiple games, it’s becoming pretty much the only issue I have with gaming on linux now, is that vram intensive games that work fine on windows and start fine on linux end up filling out the vram, giving me Failed to allocate NVKMS memory errors and then freezing/crashing out from then on

I can very easily recreate this in a game like Squad for example by just going into the offline training range with settings on high/max, creating a bunch of particles effects or just running around and shooting things and watch the vram fill, as soon as it gets close to 8gb everything starts to stutter and pretty soon after it will freeze. I’ve also experienced similar on kingdom come deliverance 2, on max settings the game runs perfectly fine but eventually from traveling around the map the vram will fill and it will freeze.

I think this is a really big issue with more and more people switching over, all of these games can launch and play fine now but on 8gb vram with any of these modern games with high vram requirements you get put on a timer until it fills up and becomes unplayable which is really frustrating, I’d be really interested in following as much progress on this as possible as it would make the experience of gaming on linux much, much better if some sort of solution to this comes up, even if it’s not 1:1 to where windows is, how it is currently is a huge limiting factor for gaming on linux imo

Regarding some earlier discussion I agree - I don’t know the details, what the nvidia driver on linux can and can’t do etc, but the blunt truth is that games + applications that are stable on windows with 8gb vram are not on linux, even if performance is degraded on windows, a game that’s filling out my vram on there allows me to alt tab, interact with my browser, use obs etc - all of these things do not work on linux currently if a game is filling out vram, applications begin to stop rendering or crash, recording/streaming stops working, gpu accelerated windows break all along with the game etc

1 Like