Non-existent shared VRAM on NVIDIA Linux drivers

The NVIDIA Linux driver doesn’t handle the VRAM sharing with the system RAM.

What I’m exactly referring to is this: Windows 10 Task Manager in GPU section

As can be seen, there is the “Dedicated GPU memory” and the “Shared GPU memory” that is actual system RAM but shared with the GPU, so in case the GPU runs out of VRAM, the system or game doesn’t simply crash or have a drop in FPS.

So what is the problem then?

The problem is that the “Shared GPU memory” doesn’t exist on the NVIDIA Linux driver, leading big issues when the GPU VRAM fills out.
Issues like (If the VRAM is full):

  • The browser (Chromium based and Firefox) cannot be opened
  • Low FPS in games and the GPU at 100% utilization
  • The desktop environment that the user uses could crash (Like in my case, when the VRAM fills out, my whole KDE Plasma desktop just crashes and restarts itself)
  • Cannot use OBS, complaining with an error like this: Failed to open NVENC: Out of memory
  • Can’t open even light programs like Cantata (Music Player) and GPU accelerated terminals like Kitty
  • Other issues that i don’t remember

So, in my case (and probably everyone that uses an NVIDIA card under Linux) the available VRAM is just the Dedicated video memory and nothing else, there is no backup.

For checking the “Total available memory” it is necessary to use glxinfo -B, in my case is:

...
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 3072 MB
    Total available memory: 3072 MB
    Currently available dedicated video memory: 1855 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce GTX 1060 3GB/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 535.86.05
OpenGL core profile shading language version string: 4.60 NVIDIA
...

And to make the difference, this is the output of an AMD card:

...
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 16384 MB
    Total available memory: 32000 MB
    Currently available dedicated video memory: 15268 MB
...

The AMD drivers can successfully use a portion of the system RAM to prevent the issues that i mentioned before (Total available memory: 32000 MB).

And also, to provide more information about this issue. vulkaninfo can be used to see the memoryHeaps which provides information about the system RAM that could be shared with the GPU and the dedicated RAM of the GPU:

memoryHeaps: count = 3
	memoryHeaps[0]:
		size   = 3221225472 (0xc0000000) (3.00 GiB)
		budget = 1912930304 (0x72050000) (1.78 GiB)
		usage  = 0 (0x00000000) (0.00 B)
		flags: count = 1
			MEMORY_HEAP_DEVICE_LOCAL_BIT
	memoryHeaps[1]:
		size   = 12521017344 (0x2ea4f9000) (11.66 GiB)
		budget = 12521017344 (0x2ea4f9000) (11.66 GiB)
		usage  = 0 (0x00000000) (0.00 B)
		flags:
			None
	memoryHeaps[2]:
		size   = 257949696 (0x0f600000) (246.00 MiB)
		budget = 236322816 (0x0e160000) (225.38 MiB)
		usage  = 21626880 (0x014a0000) (20.62 MiB)
		flags: count = 1
			MEMORY_HEAP_DEVICE_LOCAL_BIT

memoryHeaps[0]: GPU VRAM
memoryHeaps[1]: System RAM that could be used as shared RAM but the current NVIDIA Linux driver doesn’t use.

Is there any plans on fixing this? This should be on every driver of every operating system, is not an optional feature. If i need to provide a nvidia-bug-report file, just ask for it. I have an 1060 3GB and a 3070 Max-Q 8GB.

This thread is a highly related with this one: VRAM Allocation Issues

6 Likes

I share the same experience as @Fijxu described. I wish it got some reply from NVIDIA staff!

1 Like

This will be addressed some day or it will be thrown to the trash like other issues out there?

NVIDIA Linux drivers are the only ones with this problem. Intel, AMD and even the NVIDIA Drivers for Windows don’t have this issue. This is very important not only because it is a basic feature that the drivers should have, also for CUDA (Because it seems to be the main reason why Linux NVIDIA drivers exists besides “Gaming”). If you ran out of memory when performing CUDA operations then you don’t have other option than using Windows or being forced to buy a other GPU with more VRAM.

I hope this issue will be addressed some day. I don’t own an NVIDIA GPU to face this kind of problems on my daily workflow.

2 Likes

They cannot fix a … a night light feature… You are asking way, way too much from this company.

It took them 5-10 years to let people use high resolution monitors with display stream compression.

2 Likes

So, will any developer or nvidia driver contributor reply about this basic feature that should be on every driver? I am sick of trying to play any game or do any graphical work with low FPS just because my VRAM is at 3.0GB

1 Like

Much time later and I have the same problem as you.

SO: Gentoo Linux
Graphics:
Device-1: NVIDIA GA104 [GeForce RTX 3060 Ti] driver: nvidia v: 535.154.05
Device-2: AMD Cezanne [Radeon Vega Series / Radeon Mobile Series]
driver: amdgpu v: kernel
Device-3: Logitech Webcam C270 driver: snd-usb-audio,uvcvideo type: USB
Display: x11 server: X.org v: 1.21.1.11 with: Xwayland v: 23.2.4 driver:
X: loaded: amdgpu,nvidia unloaded: modesetting dri: radeonsi gpu: amdgpu
resolution: 1920x1080
API: EGL v: 1.5 drivers: nvidia,radeonsi,swrast
platforms: x11,surfaceless,device
API: OpenGL v: 4.6 vendor: amd mesa v: 23.3.1 renderer: AMD Radeon
Graphics (radeonsi renoir LLVM 17.0.6 DRM 3.49 6.1.67-gentoo)
API: Vulkan v: 1.3.268 drivers: radv,nvidia surfaces: xcb,xlib

1 Like

same issue with gtx 1650

Having the same problem on gtx 1060

2024 and I found to have the same issue with my rtx3050 mobile

±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3050 … On | 00000000:01:00.0 Off | N/A |
| N/A 50C P0 17W / 80W | 189MiB / 4096MiB | 5% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 4096 MB
Total available memory: 4096 MB
Currently available dedicated video memory: 3696 MB

I get actually crashes in games and and also with stable diffusion (running out of memory)

Additional info:
Laptop: ASUS-TUF-Gaming-F17-FX707ZC4

When I used the older version of 535, it was running fine, the problem only occured with the last two updates on the driver (i hoped the 2nd one would fix it again, but had no luck)

what was changed from the initial 535 release?

afaik, no one has posted an issue on the nvidia open driver repo Issues · NVIDIA/open-gpu-kernel-modules · GitHub. maybe it might gain more visibility there?

That is only for the open kernel?
I use the proprietary version

I’m not sure if I’m missing something here but I believe in order to use unified memory you must load the nvidia-uvm kernel module.

My understanding is the path of least resistance is to load it via the nvidia-modprobe tool

https://manpages.ubuntu.com/manpages/xenial/man1/nvidia-modprobe.1.html

something along the lines of: /usr/bin/nvidia-modprobe --unified-memory --create-nvidia-device-file=1

Hope that helps!

Thanks. I will try it later if I can do it.

VRAM management is simply, really, really bad with Nvidia on Linux. It doesn’t matter if you enable this or not; it does not help, and Nvidia isn’t doing anything about it; everything just becomes a lagfest or outright crashes if you are running out of VRAM.

Well, I tried it and it didn’t work at all.
It was kinda tricky to do but I basically unloaded the nvidia modules and I loaded them again using /usr/bin/nvidia-modprobe --unified-memory --create-nvidia-device-file=1(with modeset on for the nvidia_uvm module)

Nothing changed at all.

I would try with the the open source driver if you haven’t already. There was something about needing that, a 6.1x kernel w/ HMM (Heterogeneous Memory) ‘CONFIG_HMM=y’ and enabling that in the module when loading. The goal post on this so to speak keeps moving. @techsav may be right in that it’s more trouble than its worth…not that that’s ever really stopped me from trying anyways myself ;)

open source driver if you haven’t already

I have a 1060 so it’s over for me. I have a laptop with a 3070 but is not worth trying because I don’t even play on it.