Wayland: applications freezing sporadically, suspected vram issues

hecate · April 8, 2025, 10:03pm

summary: applications will sometimes freeze and cease rendering new frames (“render halt”, as a hang implies it resumes at some point, which this never does). Application audio will continue, and inputs are still processed under the hood.

Nvidia-open vs nvidia (proprietary) does not make a difference. Proprietary drivers w/ GSP disabled did not prevent this either.

Graphics cards with more vram (e.g. my 3090 with 24gib of vram) are not immune to this; they just have this happen less frequently. Happens to both wayland and xwayland windows (e.g. has happened once or twice to an alacritty / ghostty window, native-wayland gpu-accelerated terminal).

Details:

I do not think that explicit sync is actually related either.
This has been a perpetual issue for my friend with a 3080 and otherwise identical setup to mine - however, it does rarely happen to me, and happened to my Steam window this morning at about 6am while I was still asleep, according to logs:

sudo dmesg -e | tail -n2
[Apr 8 06:35] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  +0.000035] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

From the looks of things, Fossilize started segfaulting a lot after that; my best guess is that fossilize was.. doing something? with shader pre-caching or something in the background, and that chewed up enough vram to make my Steam window explode (& then released the vram, because only ~3.5/24gb were used when I poked at my desktop about this).

This has been happening very consistently for some friends (who I will direct to put bug reports for in this thread ☺️) - we’ve done a lot of poking between window managers, driver versions, proprietary drivers w/ GSP disabled, etc - and I generally struggle to reproduce this with my 3090, but it can still happen OVERNIGHT to a random window on my system.

sidenote: this thread looks somewhat related, but it’s being observed with any vulkan/gl applications in wayland, regardless of xwayland - a native wayland gl-accelerated terminal freezing during runtime is weird. These issues we’re observing are separate from the vram issues surrounding suspend/resume (me & my friends are “turn displays off but leave system running overnight” people, because uptime is all about those 9’s)

Ideally, vram could be paged out to system ram when full (as degraded performance is preferable to render halts). Also ideally, applications wouldn’t balloon their vram usage infinitely, but we can’t all be winners, unfortunately.

sariya_m · April 9, 2025, 4:39am

Hi,

I’m the friend in question. Here’s a sudo nvidia-bug-report.sh --extra-system-data immediately after Discord froze and subsequently crash-looped 3 times – I can reproduce this any time with discord-canary with hardware acceleration on. (Discord stable, with hardware acceleration off, does not exhibit the same issue). When this issue happens, the following error is observed:

[Tue Apr  8 20:38:06 2025] [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to map NvKmsKapiMemory 0x0000000041e744b9
[Tue Apr  8 20:38:22 2025] [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to map NvKmsKapiMemory 0x000000006ddb51d5
[Tue Apr  8 20:38:38 2025] [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to map NvKmsKapiMemory 0x00000000d7f128b0
[Tue Apr  8 20:38:52 2025] [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to map NvKmsKapiMemory 0x000000002b03a0af
[Tue Apr  8 21:04:56 2025] [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to map NvKmsKapiMemory 0x00000000af0cc908
[Tue Apr  8 21:05:12 2025] [drm:__nv_drm_gem_nvkms_map [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to map NvKmsKapiMemory 0x000000005507ab12

nvidia-bug-report.log.old.gz (1.4 MB)

And here is a more “normal” exhibition of the bug, playing DIRT Rally 2, and the game froze with the last drawn state at some point (presumably when the game tried to allocate some VRAM and failed to because of the VRAM being nearly full at the time, per nvtop)
This one is preceded by this message in dmesg:

[Tue Apr  8 21:33:56 2025] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NVKMS memory for GEM object
[Tue Apr  8 21:33:56 2025] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NVKMS memory for GEM object
[Tue Apr  8 21:33:56 2025] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NVKMS memory for GEM object
[Tue Apr  8 21:33:56 2025] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NVKMS memory for GEM object
[Tue Apr  8 21:33:56 2025] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NVKMS memory for GEM object
[Tue Apr  8 21:33:56 2025] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to allocate NVKMS memory for GEM object

nvidia-bug-report.log.gz (1.3 MB)

I was watching nvtop the entire time while gaming trying to trigger the bug the second time, and can definitely confirm this bug seems most likely to occur at the very high end of memory usage – which is especially brutal on my 3080 with only 10Gi of vram.

hecate · April 13, 2025, 1:30am

[Apr12 17:33] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  +0.000026] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

happened again on my host an hour ago; this time it happened while I was away (the screen was locked & display was off) and I was ssh’d in to the host. 4gb vram in use according to nvtop. Kinda baffling.

Upon my return, the game I had left open was exhibiting the same behavior (no new frames, but still gave audio & would respond to inputs, judging by audio)

hecate · April 14, 2025, 1:34am

Alright - so some games (Deadlock in particular, amusingly enough) love to freeze when left open with my screen locked and displays off.

Same nv_drm_gem_alloc_nvkms_memory_ioctl error as above. I have yet to be able to ‘quickly’ repro this, as it only happens when something is left open for several hours while I’m away.

You might be inclined to say “it’s a bit silly to leave games open for hours at end while afk, don’t do that” but: they simply have this happen the most consistently. This occasionally also hits one of my Ghostty windows overnight. Hopefully soon I can manage to catch this relatively quickly and get a (maybe more useful?) bug-report.sh that isn’t hours after the fact.

hecate · April 21, 2025, 7:59pm

[Apr21 12:20] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  +0.000044] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  +3.407522] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  +0.000033] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[ +14.887412] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
[  +0.000044] [drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object

I can consistently reproduce this by leaving my desktop idle (-> swaylock → displays off after 10m) with Deadlock open. After about an hour or so of idle time, I’ll return to the game being stuck frozen, with many of the above loglines. Whatever it is that causes the vram to get gobbled up releases it all by the time I unlock my computer.

This also happens overnight with the basic Steam x11 (via xwayland-satellite) window. Usually after about ~6 hours, while I’m asleep.

It looks like there’s a few issues here:

vram use climbs while desktop is idle and locked. Unrelated to the issues abut suspend/wake I’ve seen
when out of vram, things freeze catastrophically due to not paging vram ↔ system ram

If vram would at least page to system ram, it’d be easier to diagnose the first issue here…

nvidia-bug-report.log.gz (520.0 KB)

chaosrifle · April 26, 2025, 6:59am

can reproduce on 5600x 4070ti on nobara41 wayland plasma, and endeavour wayland plasma, and endeavour x11 plasma.

easiest way to reproduce it is to slam the gpu with a game that is behaving badly lately, DCS World Standalone on a moderately busy server does it pretty fast. if still not, enable a targeting pod, should get you no problem. other apps freeze up and wont recover until you terminate the game to free the vram, and the whole desktop gets stuttery. some applications need to be restarted as they will never unfreeze.

LinVidia21 · April 27, 2025, 5:54am

I am having this same issue as well. Last Epoch seems to be the game.

The other programs just kind of hang (discord steam, etc) but only visually. They fully function otherwise.

I also occasionally get an alert that plasma had dropped back to software rendering if it happens long enough.

Something is screwy with the new 570 driver I think, because I did not have this previously.

heberb · April 30, 2025, 10:27pm

Also getting this error. for me its Halo 2 Anniversary after 2+ hours of gameplay and Gmod after a few minutes with heavy addons. They seem to be working fine under the hood if I attempt to play it, its just the display that gets killed.

My steam client window also freezes, but it starts working again if I open and close it.

jNines · April 30, 2025, 10:48pm

I’ve been having the same issue with Last Epoch since the latest patch where they upgraded Unity. Usually start having issues within an hour on a 3080 10GB. Nothing locks up, but fps will drop from ~120 to below 30 and I’ll get the Failed to allocate NVKMS memory for GEM object error.
nvidia-bug-report.log.gz (1.3 MB)

hecate · May 8, 2025, 2:17am

This is still an issue a month later.

See also Non-existent shared VRAM on NVIDIA Linux drivers - apparently also entirely without acknowledgement or comment.

This is a solved engineering problem space, man. This is necessary functionality for end user systems.

sariya_m · May 20, 2025, 6:53pm

Still happening, and I’ve had to start coping by always having nvtop up and dodging maxing out my vram ever ever lest all of my apps freeze.

hecate · May 20, 2025, 7:56pm

wish i could edit the title to ‘Wayland: applications freeze when vram alloc fails, paging to system ram needed’ or something along those lines

i’ve frankensteined the GPU memory stats bits out of nvtop and rolled it into a waybar module here so that at 90% vram util I can take steps to prevent my system from freezing.

userland should not have to resort to measures like this to maintain system operability

renari · June 14, 2025, 8:04pm

This is happening to me consistently when playing Stellar Blade, my background windows will freeze and stop repainting updates, this most commonly happens to Discord and my browser. It’s reproducible every time I play the game.

Video Card:
Driver: NVIDIA Corporation NVIDIA GeForce RTX 4070 Ti SUPER/PCIe/SSE2
Driver Version: 4.6.0 NVIDIA 575.57.08

renari · June 14, 2025, 8:34pm

Related issue:

Also here’s my log, but it’s basically the same as the above errors.
nvidia-bug-report.log.gz (598.8 KB)

hecate · July 5, 2025, 1:48am

Good lord. I have a fun partial solution here.

Turns out that an application profile that sets GLVidHeapReuseRatio=1 against my compositor’s process name can reduce its idle memory usage from 2668MiB to 168MiB, and saving 2.5GiB of vram (or more?) has dropped my vram footprint significantly.

Dunno if applying this against other processes will be meaningful or needed, though.

renari · July 5, 2025, 7:40am

That sounds like an it's broken difference lol.

hecate · July 5, 2025, 9:33am

absolutely. it’s kinda dumbfounding how the solution here is:

a random json file in etc
that does a string match on a process name
to change allocator behavior based on a knob
and this was sorta noted in release notes briefly with a mention of default profiles for “some” Wayland compositors

Admittedly a control knob for a heap allocator is incredibly normal, but that being a magic bullet to the tune of 2.5GiB of vram is rather mind boggling.

I’m sure there’s some reason why this couldn’t be done more sanely, but it’s a comple mystery to me as a consumer.

The lack of vram paging / tendency for things to jam up when vram nears full is still possible, I think, but vastly less likely to occur from now on with this increased headroom (and decreased growth during use - I don’t think I’ll be seeing my compositor spiral up to 4 or more GiB of used vram anymore now that it’s down below 200MiB constantly…)

BlueGoliath · July 5, 2025, 12:19pm

Sounds like Nvidia was listening to the geniuses in the Linux community claiming that unused RAM is wasted RAM.

renari · July 5, 2025, 11:24pm

In all fairness this would be perfectly fine if shared memory was working.

renari · July 5, 2025, 11:51pm

Reading more of this thread, it seems these profiles are now bundled in the driver since 565.77, are you on an older driver version? If not this shouldn’t have any changes.

Topic		Replies	Views
Non-existent shared VRAM on NVIDIA Linux drivers Linux	137	30229	January 18, 2026
VRAM Allocation Issues Linux	86	27627	January 24, 2026
Vram is full. filled by Xorg and other porgrams Linux	7	4441	September 17, 2016
Desktop freezes when waking up from sleep and using AMD memory encryption Linux opengl , wayland , x11	0	78	June 7, 2025
Wayland, Nvidia driver errors while trying to resize window while Factorio is running Linux	2	1301	June 25, 2025
555 crashes Xwayland when playing games Linux	3	1889	July 5, 2024
Extreme (growing) memory usage in X11 OpenGL or Vulkan applications after suspend+resume Linux opengl	19	1507	May 22, 2025
455.23.04: Page allocation failure in kernel module at random points Linux	88	19797	January 15, 2021
535.129.03 freeze system, crash or getting nuts on RTX3050 Linux	14	1787	August 22, 2024
555 release feedback & discussion Linux	277	47164	February 3, 2025

Wayland: applications freezing sporadically, suspected vram issues

Related topics