Extreme (growing) memory usage in X11 OpenGL or Vulkan applications after suspend+resume

If an OpenGL application on X11 (which includes compositors of window managers if they use GLX!) is running while the PC is sent to sleep (suspend-to-RAM), they will continuously use more RAM after resuming.
In case of XFWM (the window manager of the XFCE desktop), which of course tends to run for a long time, it often continues to eat memory until there is none left and the machine crashes or becomes unusable, see XFWM4 Memory Leak (#825) · Issues · Xfce / xfwm4 · GitLab
One person reported that XFWM used 50GB of memory.
Someone else was able to trace it to glXSwapBuffers() (and everyone in that bug report uses nvidia binary drivers).
Note: With XFWM it only happens if the compositor is enabled.

Anyway, I was able to reproduce this problem independently of the window manager’s compositor, with a very simple example application: Minimal OpenGL on X11 example, to reproduce OpenGL driver bug · GitHub

If I start it and run watch "ps aux | grep glxsimple" in another terminal, I can see that the memory usage is constant, at first.

But after suspending my PC and resuming it (while glxsimple is running), the main memory usage of glxsimple continuously grows by about 2MB every four seconds.

It can of course also be reproduced with glxgears, but that takes a bit longer than glxsimple until its memory usage stops growing after starting it (before suspending the machine, after resuming it grows anyway, maybe even faster than with glxsimple).

I’m using XUbuntu 24.04 with nvidia-driver-565 version 565.77-0ubuntu0~gpu24.04.1.
In the XFCE issue people report that it also happens with other driver versions like 565.57, 565.77 570.86.16, 570.124.04 and the currently latest release, 570.133.07.
Reports about 550 versions were mixed: Some said downgrading to a 550 version of the driver fixed the issue for them, others said that they had this problem even with 550.

Update: This also affects Vulkan, vkgears shows the same behavior as glxgears or glxsimple (growing memory usage after suspend+resume)

3 Likes

By the way, here’s the bug report log (which was nontrivial to obtain because running nvidia-bug-report.sh with driver version 565.77 causes a kernel panic on my machine): nvidia-bug-report.log.gz (219.0 KB)

I just updated the driver to 570.124.04-0ubuntu0~gpu24.04.1.
At least nvidia-bug-report.sh doesn’t cause kernel panics anymore with this driver, but the memory usage problem persists. Here’s the log from the 570 driver: nvidia-bug-report.log.gz (402.9 KB)

2 Likes

Just want to mention , I have the same issue.

Likewise I have the same issue on 570.133.07

Hi All,
Thanks for reporting issue, I have filed a bug [5204322] internally for tracking purpose.
It would be good to know if it’s indeed a regression and last working driver.

The problem did NOT exist in 550.144.03. It DOES exist in 565.57.01. I don’t know about anything in between.

1 Like

There was a report that said it existed in 550

That may be true. I don’t recall having the problem in fedora 40 with that driver version.

We were able to reproduce issue internally, shall update further when I have engineering update.

4 Likes

I’m fairly certain we’re seeing the same or similar issue on Windows with OpenGL - the newer drivers are leaking memory over time, without changes to our code.

I tracked down the problem that causes memory growth after a VT switch (or suspend & resume, which does a VT switch behind the scenes) and it should be fixed in a future release. The problem is definitely Linux-specific though, so any problem you’re seeing on Windows is unrelated.

You can work around the problem by enabling NVreg_PreserveVideoMemoryAllocations in the nvidia module parameters and enabling the relevant systemd units as described here: Chapter 21. Configuring Power Management Support

2 Likes

I enabled NVreg_PreserveVideoMemoryAllocations but now I’m getting a black screen after resuming from a suspend. Sometimes it takes me to the login screen after a minute but othertimes it just stays and I have to reboot.

I’m also getting random blackscreens if I don’t touch my keyboard for a minute or so. During these random blackscreens I can still hear audio and my mic works (people can hear me in discord). The power saving settings are definitely set to never turn the screen off. Using the keyboard brings it back.

I definitely have enough in /tmp and have tried setting it to other paths too, Any ideas?

Weird. Can you please generate and attach an nvidia-bug-report.log.gz after a failed resume? You might need to SSH into the system to do it remotely if the screen isn’t working.

Which desktop environment are you using and is it Xorg or Wayland?

I have been seeing the same issue suspend/resume while using Fedora 42, xorg-x11-drv-nvidia-570.133.07-1.fc42.x86_64, kernel-6.14.2-300.fc42.x86_64, and Xorg.

Sorry for the delay. I haven’t been able to replicate the full failed resume yet. However, I am getting the black screen before login and random blackscreens after login, every 30 seconds. The only the thing that solves it is running xset -dpms on each resume.

I am using:

  • Ubuntu 24.04.2 LTS
  • Xorg
  • Nvidia Driver Version: 570.124.04
  • CUDA Version: 12.8
  • Kernel Version: 6.11.0-24-generic x86_64

Here is the log after a resume:
nvidia-bug-report.log.gz (402.9 KB)