Shared Vram on linux --- super huge problem

dartefi · June 22, 2025, 1:02am

Is there like a Pre-Alpha 580 driver I can test for verifying that System backed VRAM on linux (does or doesn’t) work … is this even being worked on? There is a major post on this with absolutely NO response from NVIDIA …

Intel and AMD can do it …

dartefi · June 22, 2025, 2:25am

ANYBODY !?

a.eslampanah · July 19, 2025, 2:55pm

I don’t know why this isn’t a priority for them, the majority of people this affects are professional users; people in ML/AI fields and for those fields Linux seems more popular..

bilandzijad · August 21, 2025, 11:42pm

Absolute tragedy of customer care

glirosxqo115 · March 11, 2026, 1:44pm

@dartefi

Is it possible to use AMD with example LLMs? From what i know that don’t really need much compute at all just vram.

Things like stable diffusion uses full cuda compute i think, so might not be easy to replace. Few hundred mb over and OOM. Big problem. Maybe AMD work for this too but slower? I’d take a bit slower vs not being able to compute at all.

Or if anyone else knows?

morgwai666 · March 11, 2026, 3:07pm

perf of LLMs directly depends on the compute power of GPUs they run on. period.
The amount of VRAM limits the size of the models you are able to run, which VERY roughly speaking, translates to the quality of answers you will get. Also, the bigger the model (in terms of the number of layers/parameters), the (mostly) linearly slower it gets.

Whether you can use a specific brand of GPUs (be it AMD, NVIDIA, Intel, others) depends on which backends (CUDA, ROCm, Vulkan etc) your engine (ollama, llama.cpp, vLLM etc) supports and specific model of your card: for example Vulkan is supported by AMD’s consumer models, but the DC models only support ROCm.
Speaking specifically, both ollama and llama.cpp support both Vulkan and ROCm, so you can run LLMs using these engines on AMD cards without problems in most cases. The only problem I’m aware of is if you have a DC AMD model (say MI60 / MI100 / MI210) that does not support Vulkan, connected as an eGPU using TB, because PCIe tunneling over TB does not support atomic operations that ROCm needs…

Whether AMD or Nvidia will give you more tokens-per-second of course depends on the specific card models and to make such comparisons fair, you need to look at models from similar price categories, which is not easy due to different computer-power to VRAM ratios:

RTX 5090 has the same amount of VRAM (32GB) as Radeon PRO R9700, 5090 is almost twice as fast and costs roughly twice as much.
RTX5080 has similar price as R9700, they provide similar perf (5080 is slightly faster), but 5080 only has 16GB VRAM.

Of course among the cards available on the consumer market, RTX PRO 6000 is the undisputed king with its 96GB, but it costs roughly the similar amount as 6-7 Radeon PRO R9700 cards…

glirosxqo115 · March 17, 2026, 5:56am

When using LLMs in my setup, the only diffrence to speed is caused my vram it looks like. Actuall gpu usage itself dont really change much and stays mostly under 50%. When processing prompt before responding it goes to around 80% but that is so short time i would say it kind of dont count. When responding it stays only around 30-40%. Not using thinking for the model.

It makes me think maybe i just dont have enough vram to cause it to use full gpu compute power. It also kind of tells me raw compute isnt as much needed with LLMs. Using cpu only for some of the models was surprisingly fast.

As for shared vram, it think it was issue with the program on linux. Maybe on linux programs needs to instruct to offload some to ram.

The stable diffusion program i used worked fine on windows but not on linux under same workload, it caused OOM. I have since updated to something else and works even better than what i used before. It also was an old not maintained version. Could be the main cause to the issues.

Thank you for taking time with your reply.

Topic		Replies	Views
Nvidia Driver Fails to use system ram when vram is full leading to crashes Linux vulkan , wayland , driver , gaming , linux-driver , x11	5	887	April 13, 2026
Non-existent shared VRAM on NVIDIA Linux drivers Linux	144	34212	April 16, 2026
Feature shared VRAM Linux	0	877	June 5, 2024
Driver support level for VRAM offload to system RAM Linux	3	2521	May 18, 2025
Issues with VRAM allocation while fine tuning LLM Linux gaming	2	248	April 13, 2026
Ram become shared VEAM in ubuntu CUDA Setup and Installation cuda	0	607	October 22, 2024
Shared System memory on Linux Linux	3	9727	February 23, 2016
With the same model and vLLM image, GB10 uses more VRAM than x86 + GPU DGX Spark / GB10	2	300	December 18, 2025
VRAM Allocation Issues Linux	90	30048	April 10, 2026
RTX 2070 Super 8GB VRAM overflow result into massiv frame drops Proton/Arch Linux nvbugs	1	473	September 17, 2024

Shared Vram on linux --- super huge problem

Related topics