Nvidia Driver Fails to use system ram when vram is full leading to crashes or performance problems. This is to gaming and not Cuda or other similar workloads where memory sharing is properly supported on Linux (as far as i can tell).
This is a technology supported by all uefi compatible gpu drivers and hardware providers including Intel and AMD for the past ten years except Nvidia’s linux driver. It is even the case that Nvidia’s windows driver even supports GTT memory sharing and yet the Linux driver doesn’t.
This is not a new issue and has been reported many times before and for many years in a row.
Here is a non exhaustive list:
The NVIDIA Linux driver doesn’t handle the VRAM sharing with the system RAM.
What I’m exactly referring to is this: Windows 10 Task Manager in GPU section
As can be seen, there is the “Dedicated GPU memory” and the “Shared GPU memory” that is actual system RAM but shared with the GPU, so in case the GPU runs out of VRAM, the system or game doesn’t simply crash or have a drop in FPS.
So what is the problem then?
The problem is that the “Shared GPU memory” doesn’t exist on the NVIDIA Linux dri…
opened 09:46AM - 28 Dec 24 UTC
bug
### NVIDIA Open GPU Kernel Modules Version
565.77-1 (but do affect every versio… n of it)
### Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [ ] I confirm that this does not happen with the proprietary driver package.
### Operating System and Version
Arch Linux
### Kernel Release
6.12.6-arch1-1 (but affects every version of linux kernel, at least all 6.x.x)
### Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [x] I am running on a stable kernel release.
### Hardware: GPU
NVIDIA GeForce RTX 4060 Laptop GPU (AD107-B)
### Describe the bug
Just as described in
#663
#618
https://forums.developer.nvidia.com/t/vram-allocation-issues/239678
https://forums.developer.nvidia.com/t/non-existent-shared-vram-on-nvidia-linux-drivers/260304
The standard DRM functionality GTT support is broken in nvidia-open modules and that made it impossible to use Shared Memory in Linux with nvidia gpus.
That's not a minor missing feature, but a major functional bug which strongly affected every Linux user with a Nvidia gpu.
#663 is closed in error, as described by @martynhare in [#663#issuecomment-2487194834](https://github.com/NVIDIA/open-gpu-kernel-modules/issues/663#issuecomment-2487194834), so it's kinda wierd for you to ignore it when this caused a lot of games, Xorg, wayland, pytorch and many other ai related stuffs to crash and complain when there's absolutely enough RAM for them.
### To Reproduce
Just use the latest edition of nvidia-open module, and it exists there.
`nvidia-uvm` won't help at all, and it's hard to find something using uvm in 2024.
Many ai stuffs doesn't support uvm at all, or has a uvm branch which is unmaintained for years.
For games, well, nvidia-uvm is only for cuda. Some of them can use dxvk which support to use system ram, but it's not a general solution and didn't fixed the problem at all.
### Bug Incidence
Always
### nvidia-bug-report.log.gz
`nvidia-bug-report.log.gz` doesn't help at all. It's a wide-affected bug in every version of nvidia-open in any environment.
Since a bot will close the issues without a `nvidia-bug-report.log.gz`, I'll upload a dummy one.
[nvidia-bug-report.log.gz](https://github.com/user-attachments/files/18265721/nvidia-bug-report.log.gz)
### More Info
Please fix this bug, it existed for years and caused pain on plenty of linux users who owns a nvidia gpu.
opened 01:35PM - 14 Jun 24 UTC
closed 11:07AM - 30 Sep 24 UTC
bug
### NVIDIA Open GPU Kernel Modules Version
550.90.07 (latest)
### Please confi… rm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- [ ] I confirm that this does not happen with the proprietary driver package.
### Operating System and Version
Multiple Setups (10+), for now on Arch
### Kernel Release
multiple ones, right now on "6.9.3-hardened1-1-hardened"
### Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- [x] I am running on a stable kernel release.
### Hardware: GPU
NVIDIA GeForce GTX 1050 Ti
### Describe the bug
This is ignored everywhere by NVIDIA employees and devs. Since 2016 we have no solution (I'm sure it was like this even before 2016). Do we need viral tweets and Reddit posts here and there bashing the company so they listen to us at all?
NVIDIA_UVM is not working even when loaded, checked via lsmod, also on a 30 series RTX. "nvidia-modprobe" does nothing. There is no dmesg to show since everything loads successfully. If the VRAM is full there is no backup option (no shared RAM like in Windows systems). We have high end graphics cards with very low VRAM, and it's slowly starting to become a fact they were produced this way on purpose.
I'm obviously annoyed. It's 2024. All other known GPU brands (AMD , Intel) don't have this issue; shared memory works just fine. It's a basic feature that should just work, just like in Windows. The NVIDIA driver still has the most annoying issues on Linux, we know you don't care about Linux users. Wayland issues, late incoming optimus support on laptops etc, you name it. If you hate open source this much don't publish the driver at all and stop further updates. From now on I will vote with my wallet (I know this won't change anything), the internet is begging you for bug fixes and you not caring just shows how you all think we have no alternative out there. For anyone here looking for fixes (there are none at the moment) check out:
1. [NVIDIA Forum Post from 2016 about this very issue](https://forums.developer.nvidia.com/t/shared-system-memory-on-linux/41466)
2. [Same issue on a 2023 NVIDIA Forum post with details](https://forums.developer.nvidia.com/t/non-existent-shared-vram-on-nvidia-linux-drivers/260304)
3. [Someone also raised this issue in the Discussions, but again. Dead silence.](https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/618)
No error logs are needed at this point, it's known shared memory (nvidia_uvm - unified shared memory) simply does not work.
If you don't want to buy an expensive GPU from NVIDIA, your only bet is to use Windows so your Games/Apps do not crash twhen your VRAM is full. The nvidia_uvm you see in lsmod acts like a placeholder for an empty file. Buy an AMD or Intel GPU for now. Like Linus said [this is the worst company they had to deal with](https://www.youtube.com/watch?v=iYWzMvlj2RQ).
**So the question is when this advertised as working feature of yours will start to work at all?**
### To Reproduce
Just install the latest proprietary driver and for once test the driver yourself as a dev. NVIDIA_UVM does not work, and if it works you used hidden parameters not known to us. Like mentioned below the nvidia-bug-report.sh script does not work, no matter which parameter passed.
### Bug Incidence
Always
### nvidia-bug-report.log.gz
nvidia-bug-report.sh is not working no matter what I do, tried the safe mode parameter, reboot etc. Of course ran as root. You have bigger issues if this is even hanging.
Since I do not know if a bot/AI manages these issues I will upload an empty log.gz file.
[nvidia-bug-report.log.gz](https://github.com/user-attachments/files/15838693/nvidia-bug-report.log.gz)
### More Info
You know the problem better than me. Please check the links I posted. Important forum posts like these should at least get an answer.
https://forums.developer.nvidia.com/t/vram-allocation-issues/239678
This is very obviously a duplicate issue and I called your phone support line about it and was told to create a post in the forum in order to obtain support. So here i am.
1 Like
The sysmem fallback is a nightmare. Instead of fighting the driver, I started shrinking the models. LoRA Lens uses SVD math to keep the weights small enough to stay in the dedicated VRAM. No sharing needed, no crashes. It’s the only way to get stability on Linux/Windows right now.
I didn’t mention AI or LLM’s what are you talking about? Of course it’s related, but it’s also a total non sequitur.
I can confirm on my GeForce 750 Ti (2 GB card) on Windows, everything was ok to run any workloads I wanted and it offloaded things to system memory seamlessly. It did this since 2014 when I put together this system.
On the same system but running Arch Linux (any Wayland DE) with the latest 580 drivers, I could barely open a few browsers and terminals before going out of GPU memory and things would crash or stop working normally. I included way more details in one of the threads you linked at Non-existent shared VRAM on NVIDIA Linux drivers .
I ended up spending almost 2 full weeks troubleshooting as much as I could, wrote an 8,500 word blog post, made videos, opened issues on GitHub which got a reply from NVIDIA, but no resolution.
All in all, since then I still use the same system but I switched to an AMD GPU (RX 480) and this problem went away entirely. I’m only posting this piece of information to confirm it very strongly appears to be a problem with NVIDIA drivers but not AMD.
1 Like