Non-existent shared VRAM on NVIDIA Linux drivers

Fijxu · July 19, 2023, 8:18am

The NVIDIA Linux driver doesn’t handle the VRAM sharing with the system RAM.

What I’m exactly referring to is this: Windows 10 Task Manager in GPU section

As can be seen, there is the “Dedicated GPU memory” and the “Shared GPU memory” that is actual system RAM but shared with the GPU, so in case the GPU runs out of VRAM, the system or game doesn’t simply crash or have a drop in FPS.

So what is the problem then?

The problem is that the “Shared GPU memory” doesn’t exist on the NVIDIA Linux driver, leading big issues when the GPU VRAM fills out.
Issues like (If the VRAM is full):

The browser (Chromium based and Firefox) cannot be opened
Low FPS in games and the GPU at 100% utilization
The desktop environment that the user uses could crash (Like in my case, when the VRAM fills out, my whole KDE Plasma desktop just crashes and restarts itself)
Cannot use OBS, complaining with an error like this: Failed to open NVENC: Out of memory
Can’t open even light programs like Cantata (Music Player) and GPU accelerated terminals like Kitty
Other issues that i don’t remember

So, in my case (and probably everyone that uses an NVIDIA card under Linux) the available VRAM is just the Dedicated video memory and nothing else, there is no backup.

For checking the “Total available memory” it is necessary to use glxinfo -B, in my case is:

...
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 3072 MB
    Total available memory: 3072 MB
    Currently available dedicated video memory: 1855 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce GTX 1060 3GB/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 535.86.05
OpenGL core profile shading language version string: 4.60 NVIDIA
...

And to make the difference, this is the output of an AMD card:

...
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 16384 MB
    Total available memory: 32000 MB
    Currently available dedicated video memory: 15268 MB
...

The AMD drivers can successfully use a portion of the system RAM to prevent the issues that i mentioned before (Total available memory: 32000 MB).

And also, to provide more information about this issue. vulkaninfo can be used to see the memoryHeaps which provides information about the system RAM that could be shared with the GPU and the dedicated RAM of the GPU:

memoryHeaps: count = 3
	memoryHeaps[0]:
		size   = 3221225472 (0xc0000000) (3.00 GiB)
		budget = 1912930304 (0x72050000) (1.78 GiB)
		usage  = 0 (0x00000000) (0.00 B)
		flags: count = 1
			MEMORY_HEAP_DEVICE_LOCAL_BIT
	memoryHeaps[1]:
		size   = 12521017344 (0x2ea4f9000) (11.66 GiB)
		budget = 12521017344 (0x2ea4f9000) (11.66 GiB)
		usage  = 0 (0x00000000) (0.00 B)
		flags:
			None
	memoryHeaps[2]:
		size   = 257949696 (0x0f600000) (246.00 MiB)
		budget = 236322816 (0x0e160000) (225.38 MiB)
		usage  = 21626880 (0x014a0000) (20.62 MiB)
		flags: count = 1
			MEMORY_HEAP_DEVICE_LOCAL_BIT

memoryHeaps[0]: GPU VRAM
memoryHeaps[1]: System RAM that could be used as shared RAM but the current NVIDIA Linux driver doesn’t use.

Is there any plans on fixing this? This should be on every driver of every operating system, is not an optional feature. If i need to provide a nvidia-bug-report file, just ask for it. I have an 1060 3GB and a 3070 Max-Q 8GB.

This thread is a highly related with this one: VRAM Allocation Issues

Unradelic · August 11, 2023, 7:27pm

I share the same experience as @Fijxu described. I wish it got some reply from NVIDIA staff!

Fijxu · August 19, 2023, 7:18pm

This will be addressed some day or it will be thrown to the trash like other issues out there?

NVIDIA Linux drivers are the only ones with this problem. Intel, AMD and even the NVIDIA Drivers for Windows don’t have this issue. This is very important not only because it is a basic feature that the drivers should have, also for CUDA (Because it seems to be the main reason why Linux NVIDIA drivers exists besides “Gaming”). If you ran out of memory when performing CUDA operations then you don’t have other option than using Windows or being forced to buy a other GPU with more VRAM.

I hope this issue will be addressed some day. I don’t own an NVIDIA GPU to face this kind of problems on my daily workflow.

perryman337 · August 21, 2023, 6:50pm

They cannot fix a … a night light feature… You are asking way, way too much from this company.

It took them 5-10 years to let people use high resolution monitors with display stream compression.

Fijxu · December 15, 2023, 4:47pm

So, will any developer or nvidia driver contributor reply about this basic feature that should be on every driver? I am sick of trying to play any game or do any graphical work with low FPS just because my VRAM is at 3.0GB

nairomarttins97 · January 26, 2024, 5:11pm

Much time later and I have the same problem as you.

SO: Gentoo Linux
Graphics:
Device-1: NVIDIA GA104 [GeForce RTX 3060 Ti] driver: nvidia v: 535.154.05
Device-2: AMD Cezanne [Radeon Vega Series / Radeon Mobile Series]
driver: amdgpu v: kernel
Device-3: Logitech Webcam C270 driver: snd-usb-audio,uvcvideo type: USB
Display: x11 server: X.org v: 1.21.1.11 with: Xwayland v: 23.2.4 driver:
X: loaded: amdgpu,nvidia unloaded: modesetting dri: radeonsi gpu: amdgpu
resolution: 1920x1080
API: EGL v: 1.5 drivers: nvidia,radeonsi,swrast
platforms: x11,surfaceless,device
API: OpenGL v: 4.6 vendor: amd mesa v: 23.3.1 renderer: AMD Radeon
Graphics (radeonsi renoir LLVM 17.0.6 DRM 3.49 6.1.67-gentoo)
API: Vulkan v: 1.3.268 drivers: radv,nvidia surfaces: xcb,xlib

omarhanykasban706 · March 9, 2024, 6:11pm

same issue with gtx 1650

hanstrus · March 11, 2024, 4:38pm

Having the same problem on gtx 1060

seb.j1 · March 20, 2024, 4:33pm

2024 and I found to have the same issue with my rtx3050 mobile

Memory info (GL_NVX_gpu_memory_info):
Dedicated video memory: 4096 MB
Total available memory: 4096 MB
Currently available dedicated video memory: 3696 MB

I get actually crashes in games and and also with stable diffusion (running out of memory)

seb.j1 · March 21, 2024, 3:27pm

Additional info:
Laptop: ASUS-TUF-Gaming-F17-FX707ZC4

When I used the older version of 535, it was running fine, the problem only occured with the last two updates on the driver (i hoped the 2nd one would fix it again, but had no luck)

what was changed from the initial 535 release?

user46827 · March 24, 2024, 5:25pm

afaik, no one has posted an issue on the nvidia open driver repo Issues · NVIDIA/open-gpu-kernel-modules · GitHub. maybe it might gain more visibility there?

seb.j1 · April 17, 2024, 7:29am

That is only for the open kernel?
I use the proprietary version

itsdotscience · May 6, 2024, 3:42pm

I’m not sure if I’m missing something here but I believe in order to use unified memory you must load the nvidia-uvm kernel module.

My understanding is the path of least resistance is to load it via the nvidia-modprobe tool

https://manpages.ubuntu.com/manpages/xenial/man1/nvidia-modprobe.1.html

something along the lines of: /usr/bin/nvidia-modprobe --unified-memory --create-nvidia-device-file=1

Hope that helps!

Fijxu · May 6, 2024, 3:55pm

Thanks. I will try it later if I can do it.

techsav · May 16, 2024, 12:27am

VRAM management is simply, really, really bad with Nvidia on Linux. It doesn’t matter if you enable this or not; it does not help, and Nvidia isn’t doing anything about it; everything just becomes a lagfest or outright crashes if you are running out of VRAM.

Fijxu · May 16, 2024, 3:23am

Well, I tried it and it didn’t work at all.
It was kinda tricky to do but I basically unloaded the nvidia modules and I loaded them again using /usr/bin/nvidia-modprobe --unified-memory --create-nvidia-device-file=1(with modeset on for the nvidia_uvm module)

Nothing changed at all.

itsdotscience · May 16, 2024, 6:38am

I would try with the the open source driver if you haven’t already. There was something about needing that, a 6.1x kernel w/ HMM (Heterogeneous Memory) ‘CONFIG_HMM=y’ and enabling that in the module when loading. The goal post on this so to speak keeps moving. @techsav may be right in that it’s more trouble than its worth…not that that’s ever really stopped me from trying anyways myself ;)

Fijxu · May 16, 2024, 6:41am

open source driver if you haven’t already

I have a 1060 so it’s over for me. I have a laptop with a 3070 but is not worth trying because I don’t even play on it.

Fijxu · June 6, 2024, 10:01pm

I decided to test the 555 driver because of Explicit Sync it works really well on Wayland and the Desktop animations are finally smoth (after years of waiting…) but the VRAM management is still really bad and it’s worse on Wayland because now if the game/program hits the VRAM limit of the card, the whole desktop will just crash closing everything using the GPU with this error on dmesg logs:

[ 2660.476282] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[ 2660.476656] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

Related: 555 release feedback & discussion - #102 by Fijxu

omarhanykasban706 · July 3, 2024, 11:31am

gtx 1650 4gb vram

nvidia, why cant you just add shared vram, its going to make linux ligher then windows, please just add it, please i really need it
i cant afford new gpu and on windows i get over 60fps and on linux i only get 20-30fps becouse of Non-existent shared VRAM