Non-existent shared VRAM on NVIDIA Linux drivers

There appears to be multiple potential bugs discussed on this thread, along with incorrect assumptions about how the driver currently works or should work.

First, NVIDIA GPUs are able and have always been able to use system memory in addition to video memory. There is no general concept of “Shared GPU memory” - system memory is usable by the GPU, all of it, and we don’t report it separately on Linux.

The NVX_gpu_memory_info extension is specified a little confusingly, and our interpretation of GPU_MEMORY_INFO_TOTAL_AVAILABLE_MEMORY_NVX of the past 15 years has been that it shall not include system memory. I have not been able to track down the thinking for that particular interpretation, and would be naturally inclined to agree with Mesa’s interpretation ; but we wrote the extension and implemented it that way, and changing it now risks breaking existing applications. This, with no effective benefit, given that this is just a number being reported. It is somewhat confusing, on the other hand, you don’t need NVX_gpu_memory_info to know how much system memory is on your system. The GPU will use that system memory whenever needed.

The suspected root cause as described in the original message is therefore not correct, but the issues are certainly real problems to look into. One needs to be aware that VRAM being full will always have dire consequences on performance, regardless of migration mechanisms being implemented. Not being able to start applications at all does seem like individually valid bugs that should be looked into.

11 Likes

With nvidia-smi -q, I get “BAR1 memory” listed - is this what the GPU will use as memory from the system?

Given this one-liner ($c is counted in a way that it explicitly drops CUDA memory from output):

nvidia-smi -q | perl -e '$a=0;$c=-1;while(<>){$a+=(0+$1)if/Used GPU Memory.*: (.*) MiB/;$n[++$c]=($1)if/\s+(.* Memory Usage)$/;$u[$c]=(0+$1)if/Used.*:(.*) MiB/;$t[$c]=(0+$1)if/Total.*:(.+)/};print"Apps: $a MiB\n";while(--$c>=0){print"$n[$c]: $u[$c] / $t[$c] MiB\n"}'

it will show for me:

Apps: 1386 MiB
BAR1 Memory Usage: 2554 / 16384 MiB
FB Memory Usage: 2533 / 12288 MiB

This would indicate that the GPU is using 16 GB of my system memory and doesn’t have access to my full 32 GB. This doesn’t mean that it couldn’t remap BAR to different locations, thus it will be able to access all of my RAM. But my impression is that it cannot see the system memory as one linear address space, it has do allocate “views” into my system memory, and that’s what BAR is for.

From my stats, I would guess that the driver currently uses memory which is mostly shadowed between frame buffer and system memory - so it is more or less “shared” in the sense what people are asking for here. Some additional MiBs are only allocated in the GPU memory. Usually, I would think that this is memory for the display plane - but one 4k screen needs at least 24 MiBs, I would also expect that each screen has at least a front and back buffer, which makes at least 48 MiBs per screen, multiplied by 3 screens - numbers just don’t add up that way… OTOH, numbers could add up properly if some allocations are exclusive to one or another pool.

Your feedback is really appreciated, and interesting. Some more insight into how the drivers works and where to read the meaningful and correct stats is very helpful. Maybe tools like nvidia-smi could just have an additional output mode where it properly shows which memory is used where and how?

I get “BAR1 memory” listed - is this what the GPU will use as memory from the system?

Negative. BAR1 on NVIDIA GPUs is used for host visible video memory = video memory accessible by the CPU. On modern systems, with “resizable BAR”, BAR1 will be large enough to cover the entirety of video memory.

So this would mean, ReBAR is not active although I enabled it in the BIOS - because 16 GB != 32 GB? I’m pretty sure I verified that it is shown as “active” in nvidia-settings UI when I enabled it.

Ignore that, I just wrapped my mind around this correctly: 16 GB BAR covers my entire video memory of 12 GB - so it’s fully active, and it’s used so the CPU can access the GPU mem…

But this also means: There’s no way in nvidia-smi to see how much system memory is used by the GPU or driver…

BTW: What helped alleviating some of the memory management errors showing up in dmesg, was to boot the kernel with init_on_alloc=1. I didn’t run a long gaming test with that yet, but at least “Control” seems to leak less memory with raytracing enabled, and textures stay sharper for longer.

I understand VRAM being full will have consequences, but when you compare Windows and Linux behavior when VRAM is full, you can say something is really not good on Linux drivers. Even browsers are unusable as we reported here many times, it’s not about performance, it’s about applications no longer functioning. Browsers crash, OBS can’t start NVENC, desktop environment can’t run its side apps…

10 Likes

If this is not by design, then here’s a thorough explanation of the practical issues we’re experiencing, and how to repro in a lab environment in a simple way which doesn’t require much effort.

The idea that system memory is technically accessible by the GPU isn’t what’s disputed, it’s whether the drivers can properly demote (offload from dedicated VRAM to system RAM) and promote back again (from system RAM to dedicated VRAM) without the application needing to make any special accommodations for this to happen, with the ability to actually allocate and use more VRAM than what the GPU can actually supply. Since more people are probably familiar with Microsoft parlance, what I’m referring to is implementing proper GPU virtual memory management with system memory fallback support.

Modern NVIDIA drivers on Linux, if the drivers are already meant to provide this functionality (they seem to for some non-C+G processes as of 580?) do so in an extremely broken way for the majority of gaming scenarios on its gaming GPUs, to the point of the functionality being non-existent in practice.

Try the following practical experiment with an average xx70 or lower GPU (e.g. 2070, 3070, 4070) and you’ll immediately notice the problem:

Pick 4 VRAM-hungry AAA games on Windows, load them one by one into a map, using pssuspend on them [to suspend them] before loading the next one. You’ll now have easily exhausted the pool of dedicated VRAM needed for each individual game to run properly. Switch between each by suspending/unsuspending them [with pssuspend] as you hop between them, and watch in awe as NVIDIA’s Windows drivers work supremely well with WDDM to dynamically reallocate VRAM to the game you’re actually playing. Overall performance will be in most cases no different to if you’d have ran only one game because the drivers are doing their job correctly. In practical, everyday use, any non-CUDA applications will “just work” as long as you don’t over-allocate into more than 50% of system RAM (as this is the WDDM limit). You’ll also be able to launch general purpose applications and even stream with OBS while doing all this.

Now try the same thing on Linux on a modern Wayland desktop (can be GNOME, KDE or anything else) using Proton with those same four VRAM-hungry AAA games, using SIGSTOP and SIGCONT in place of pssuspend to switch between them. The VRAM will not be correctly reallocated between each process and by the time you get to loading in the fourth game, the framerate will absolutely tank, if you can get it to open at all. Worse still, try loading ordinary applications. Some will work and some will fall flat on their face, depending on how VRAM-dependent they are. Many to their credit will render transparent windows and still technically accept mouse input under the hood, allowing you to experience the digital equivalent of blindness.

This is what people mean when they say NVIDIA drivers don’t have shared memory support on Linux. The drivers don’t pull the rug on processes which are inactive (by demoting their used video memory to system RAM) to benefit more active ones (by promoting theirs to dGPU VRAM). Worse still, they don’t seem to offer enough resources to new processes in the first place under real-world conditions.

I’d very much appreciate it if this long-standing bug could be fixed as soon as possible, given it falls into the category of expected functionality not being present on Linux (working just fine on Windows 10 since at least 2016 for non-CUDA processes, and exceedingly well since mid-2023 for mixed graphics/compute workloads).

3 Likes

this video is somewhat relevant to this issue:

I have a GeForce 750 Ti (2 GB card) and am encountering this issue with the 580 DKMS official drivers. I completely run out of GPU memory opening up a few windows and it causes system instability with programs crashing and my window compositor breaking in unattended ways.

Same device works great on Windows

I’ve had this same computer / hardware for 11 years on Windows 10 and never ran into GPU memory issues. I could have many browser windows open, record x264 videos with OBS and also played an assortment of games. Not once did Windows 10 crash in a decade of this usage.

Problems only on Linux

I switched to Arch Linux with niri (Wayland compositor) the other day and noticed a lot of instability, especially when opening Firefox and trying to record an x264 video with OBS. Likewise Ghostty (hardware accel enabled terminal) would core dump if I opened more than a few instances of it.

I set GLVidHeapReuseRatio to 0 as per the niri guide for NVIDIA cards since Wayland apparently has trouble with this. This helped a lot. nvidia-smi reported using only 75 MB for niri instead of the 1 GB it was using previously.

Ongoing issues

Even with the above, niri’s memory usage climbs up and doesn’t get reclaimed when windows are closed. It’s at 300 MB now for reference.

I notice a lot of instability when the GPU memory reaches about 1.5 GB out of 2 GB. Just 2 Firefox windows will use 400 MB for reference. Ghostty uses about 120 MB per instance.

Basically I can’t open more than a few things at once, but my system is only using 3 GB out of 16 GB of system memory. The outcome is much worse than Windows.

nvidia-smi shows 2048 MB total. When I run nvidia-smi -q it shows:

    FB Memory Usage
        Total                                          : 2048 MiB
        Reserved                                       : 58 MiB
        Used                                           : 1093 MiB
        Free                                           : 898 MiB
    BAR1 Memory Usage
        Total                                          : 256 MiB
        Used                                           : 37 MiB
        Free                                           : 219 MiB

I believe this demonstrates that my system’s memory is not being shared / used by the GPU?

Everything I’ve seen so far indicates as soon as about 1.5 to 2.0 GB of GPU memory is used, everything starts to break and it’s 100% fully isolated to being on Linux only.

Rebooting or restarting niri every few hours isn’t a viable solution. Is there anything I can do to get the NVIDIA drivers to use system memory in the same way it did on Windows?

Thanks.

Even if I did reboot every few hours, it still doesn’t help me do the same workflows I did on Windows with the same hardware. It’s unfortunate because I really want to use native Linux full time. Opening up a few browsers and terminals while recording a 1080p 30 FPS video shouldn’t take down my system when the CPU load is 25% and system memory usage is at 20%.

It’s concerning because NVIDIA mentioned the 580 series of drivers will no longer receive updates after August 2026. 10XX and older cards are the ones that are impacted the most by this as it renders the system kind of not usable (within reason).

2 Likes

As already pointed out by @ahuillet, the NVIDIA drivers do not track system memory usage:

It would probably help the discussion a lot if nvidia-smi could show those allocations but for now we have to live with the fact that graphics memory isn’t fully tracked, and trust that the driver supports “shared memory” (which I think is still the wrong word for the concept being asked for here).

The driver cannot tell you about that. The GPU can and does use system memory, it just cannot track its usage in a way for user-space to show such stats. Also, I learnt that BAR memory works the other way around - it doesn’t say how much memory a process or the GPU uses. Instead, it says how much memory the CPU can currently access directly on the GPU. On your old system, BAR can only access a small window of GPU memory, and that window currently has 219 of 256 MB free for the CPU to work with and for the GPU to shuffle memory around. This doesn’t limit how much memory can be used, it only affects performance. On modern systems with ReBAR, the full video memory can be made visible to the CPU. In both cases, your GPU always sees all memory on the card, and it can probably access system memory via DMA - tho, I’m not sure how that actually works. But it’s much slower if rendering needs to access resources from system memory. You can see that this happens if nvidia-settings shows a permanently high PCI bus usage. That alone proves that the driver can and does use system memory, and game performance usually tanks when that happens. So it rather shows that memory management is not very optimized and flexible in the Linux drivers - at least for gaming and desktop -, as suggested here:

@nick.janetakis That said, with an as old GPU as yours, maybe you want to try the open source drivers like NVK instead? They should well for desktop, and for the old games this GPU can run, it will probably also perform well enough?

This isn’t the case though, at least not for desktop usage. My performance doesn’t tank. As soon as I hit about 1.7 GB out of 2 GB of my GPU’s memory usage apps crash, my window manager starts failing in unpredictable ways and the Linux kernel throws errors related to the NVIDIA drivers.

I can reproduce this 100% of the time, not exactly at 1.7 GB but absolutely without question when nvidia-smi reports in between 1.5 GB to 2.0 GB usage it will crash. This indicates as you mentioned the tool itself probably isn’t reporting things accurately.

In any case, definitions aside, the experience with this card on Windows was perfect for 10 years but it’s kind of not usable on Linux because of all the reasons mentioned in my previous comment.

For example here’s some output from journalctl:

Dec 28 12:41:10 kaizen kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

When I start getting this, everything starts to become unstable and even if I close windows to reclaim GPU memory the system is put into a very unstable state and I have to reboot. Keep in mind I’m not doing anything crazy here, we’re talking a few instances of Firefox and terminal windows.

Oftentimes pipewire gets involved too right after that error:

Dec 28 12:43:15 kaizen pipewire[1077]: pw.core: 0x55a64d9cd4b0: error -22 for resource 2: port_use_buffers(0:0:0) error: Invalid argument
Dec 28 12:43:15 kaizen pipewire[1077]: mod.client-node: 0x55a64dc63cb0: error seq:268 -22 (port_use_buffers(0:0:0) error: Invalid argument)
Dec 28 12:43:15 kaizen pipewire[1077]: pw.link: (104.0.0 -> 106.0.0) allocating -> error (Buffer allocation failed) (paused-paused)

Restarting pipewire has no effect on regaining stability. This usually happens if I’m trying to record my screen with OBS when this problem happens. This is only with settings of x264 1080p 30 fps videos too, nothing too crazy. My CPU is only at 20% load. I’ve recorded literally over 1,000 videos on Windows with the same settings without a single crash or issue.

I tried them initially, it was catastrophically bad. niri (my compositor) identified both my monitors which are a 4k 60hz monitor and 1440p 60hz monitor but nothing would render on the 4k monitor. It was like empty space, the mouse cursor would be invisible on top of it and it used niri’s default background color instead of showing a wallpaper.

Also my machine hard locked twice within 30 minutes. After uninstalling them entirely and using the official DKMS 580 series, everything was much more stable but now I’m battling this issue where if GPU memory touches ~75%+ I have to reboot which is a few times a day. Not very stable compared to Windows IMO.

As an end user who is not involved with GPU driver development all I can do is give feedback on the end to end experience and what the tools tell me.

I’ll repeat:

Windows never crashed once in 10 years of using my machine in a mental state where I never even paid attention to GPU usage. This includes opening a ton of stuff and playing games too, anything that would run with a decent enough frame rate stayed running (Path of Exile, etc.).

Linux, I have to reboot multiple times a day and monitor my GPU memory usage all the time, and can’t open more than a few apps at a time or I have to reboot due to instability. All while my CPU load is 2-20% while only using 20% of my system memory (~3 GB out of 16 GB).

I don’t know what else to say. I’d be happy to provide more information if it helps debug and resolve why the drivers don’t “seem to” touch my system memory when the GPU memory is under pressure. Again, definitions aside that’s what it feels like is happening based on all of the evidence.

I’m also open to trying anything because this is the only thing that’s a problem on Linux for me. Everything else is so good and I really don’t want to go back to Windows but I can’t continue with a system where I can only open 1 Firefox window and 2 terminals while recording a video. I haven’t tried recording a video more than 2 minutes too, I hope duration doesn’t affect things if GPU memory continues to go up as the recording goes longer. On Windows I’ve recorded videos that were 3 hours long without issues.

2 Likes

My suggestion:

  1. Test a “mainstream” Linux setup, like Ubuntu 24 LTS with their built-in NVIDIA driver installer and check if you see the same instability. The reason for this is you’d want to show that your issue happens (or doesn’t happen, even better) on a commonly-used, well-tested setup, as opposed to on a niche, custom setup. I suspect NVIDIA may be more motivated to investigate this.
  2. Create your own thread to track the status of the problem, and describe the issue in a practical way. E.g. instead of vague “non-existent shared VRAM”, I would title the problem as “Ubuntu desktop instability on high VRAM utilization” or similar.
    a) Include a concise description of the problem, setup, repro steps, and how the same works fine on Windows.
    b) Include a bug report file, as per https://forums.developer.nvidia.com/t/if-you-have-a-problem-please-read-this-first/
  3. Reference your thread in the 580 driver feedback thread: https://forums.developer.nvidia.com/t/580-release-feedback-discussion/

this problem is very apparent when playing a game like assetto corsa, and joining a server with lots of cars or a big map. On windows, I see the VRAM go up to 8gb (the max for my 3070), and then start filling the system ram. On Linux, I do see this behaviour, but my performance is completely tanked the second it happens.

Yet again I would highly and kindly suggest you to read this thread in its entirety: this was demonstrated on several supported distros by now.

@nick.janetakis please do not: such small very specific threads tend to die quickly and be forgotten. OTOH, keeping this thread alive, it shows exactly that many ppl across various distros are affected by the same problem, thus proving its severity.

Again, please kindly read this thread finally before giving an advice detached from the fact that several ppl have very clearly demonstrated what the problem is about.

I really don’t understand your ongoing animosity towards the effort to push Nividia to fix this issue: what is your purpose/reasoning behind this?

Do you know if it’s possible to re-create this consistently offline, like with a save file? I think this could be a great candidate for its own bug report thread too, with easy repro steps and a bug report file attached, and a performance comparison with Windows.

The software engineer from NVIDIA has already explained that the suspected root cause was incorrect, and that there were multiple individual bugs that should be investigated. Therefore the best course of action is to report these bugs individually, following bug reporting best practices for each one.

even better, I shared over a year and half ago on another thread about a VRAM leak on spider-man remastered (which still happens btw!) Doom Eternal crashes with 495.46 on high VRAM usage - #14 by faz

The issue presents itself very easily here. Use an 8gb vram nvidia GPU, set textures to very high. Play and notice VRAM slowly creeping up to 8gb. As soon as it reaches 8gb, prepare for mayhem as the game begins to decrease in FPS drastically, and freeze and stutter. At this point, the rest of the system slows to a halt. Can’t open browsers, stuff becomes unresponsive, can’t record with OBS / nvenc etc etc. Game also crashes a few minutes later.

This should not be the case. On Windows, the game does not exhibit this behaviour, and even when it does reach 8gb the rest of your system is still more responsive than it is on Linux.

Before anyone says that 8gb vram isn’t enough for this game at very high settings.. besides the fact it runs perfectly on Windows without problems at all, the official game requirements for very high settings (AT 4K!!!) is an RTX 3070:

Hopefully this helps and we can get to the bottom of this problem :)

2 Likes

I’m having the same problem with MechWarrior 5: Clans admittedly on ancient hardware (GTX1060 6GB)

It’s fine during normal gameplay but when the match is over the VRAM if close to 6GB you’ll be at a stuttery 3-5fps back in the briefing room which doesn’t recover until the game is restarted.

The same behavior isn’t mirrored on windows. The linux driver is 580.119.02 and the windows driver is 581.80

Sadly I think this is going to be more common going forward.

1 Like

Same behavior in Squad. Stuttering and freezing/hanging starts whenever VRAM gets to 8GB. That means when trying to aim for example, the whole image just completely freezes until i quit aiming (zooming) or start looking toward a lesser populated area. This behavior does not happen in windows, the game performance of course gets a small hit too but its playable in windows.

So what the whole thread tries to say is the GPU hangs whenever VRAM fills and it cannot offload VRAM into RAM which is basic behavior in AMD GPU drivers in linux and windows nvidia drivers.. It is a malfunction.

It depends.

It’s the exact terminology used on Microsoft Windows, even if it’s somewhat technically inaccurate due to its historical connotations. On-device GPU VRAM is known as Dedicated GPU Memory, while system RAM repurposed for the GPU to use is known as Shared GPU Memory.

On Linux: AMD calls GPU accessible system RAM the GTT domain (as opposed to VRAM domain for dedicated VRAM, and CPU domain for inaccessible RAM respectively). Intel calls their iGPU memory management implementation PPGTT. In both cases, it’s vernacular which describes system RAM made available through the use of Graphics Translation Tables.

In any case, once NVIDIA fixes this big bug that’s easily reproducable, we’ll all be much happier.

2 Likes

Thank you to the folks who replied. I haven’t forgotten about this thread.

I have good news and bad news. Since I posted the other day I was able to refine the problem. Please see Non-existent shared VRAM on NVIDIA Linux drivers - #119 by nick.janetakis for the original details.

Since then I am able to reproduce the problem on:

  • niri (Wayland)
  • KDE Plasma (Wayland)

However KDE Plasma X11 does not have this problem.

All 3 variants (niri, Plasma Wayland, Plasma X11) are using the same Arch Linux distro with the same Linux kernel and same NVIDIA drivers. All packages are the same. The only difference is I’m launching a specific environment when I login.

When the GPU’s memory gets full with X11, I can continue to open more hardware accelerated apps (Firefox, Ghostty terminal, etc.) and my system’s memory will get used just like it did on Windows.

At one point I had something like 38 Firefox windows open with 9 Ghostty terminals and my GPU’s memory was being reported at 1985 / 2048 MB and my system’s memory was at like 8 GB out of 16 GB.

During this state, the system is fully usable. I can play YouTube videos back without issues, typing feels normal in Ghostty, etc.. I can even record videos with OBS.

As a reminder, with both niri (Wayland) and KDE Plasma (Wayland) what would have happened after running out of GPU memory is, Firefox would render blank windows and Ghostty would core dump and then the NVIDIA drivers would produce journalctl errors with kaizen kernel: [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object. These do not happen with X11.

With that said, maybe this problem’s scope is now more refined to using Wayland.

Unfortunately niri cannot be run in X11 mode so this is a real bummer for me, but maybe now a proper bug report could be given to NVIDIA to fix this in the 580 dkms drivers?

If you ask me - and I had similar experiences - my theory is still that most Wayland clients cannot handle a situation where a memory allocation would be rejected, and then just crash. And I also think that such software probably explicitly asks the drivers for device-local memory without allowing host memory (from the perspective of the GPU, host memory is on the mainboard).

What supports this theory is that using Chrome with Vulkan seems to just work around these issues - where Vulkan supports allocating selectively from device-local or host memory only, or not be specific about it. Obviously, as a dev, if you ask for too much device-local memory, you must also handle the fact that those attempts may be rejected.

What contradicts this theory is that with other drivers like AMD, this usually works just fine.

But OTOH, this may just uncover an underlying issue that the NVIDIA driver is not able to properly or fully migrate previously allocated memory. And I also don’t know if Wayland even has such a concept, or if it tries to handle those allocation flags “under the hood”, or heuristics are going wrong.

But I’m pretty sure that expecting an allocation to always succeed and not handling the error path is the main cause for these crashes. For normal system memory, applications usually get away with such behavior because the kernel allows to overcommit memory, swap memory, but it will then fail fatally (like “oops”) if it itself can no longer allocate memory.

Maybe it’s an overcommitting issue? Where other drivers allow that and NVIDIA doesn’t?

Update: I quick AI research showed that memory management for wayland only works via abstract buffers. The driver implementation itself has to properly handle where the memory goes, or how it is stored. So yes, this is probably on NVIDIA’s behalf to fix. Still, apps shouldn’t just crash because they are out of video memory and fall back gracefully or show a proper error message… But there’s also something like wl_shm which involves the compositor and Wayland clients to properly handle some memory management. The latter is most likely what I called the fallback path before.