Extreme (growing) memory usage in X11 OpenGL or Vulkan applications after suspend+resume

If an OpenGL application on X11 (which includes compositors of window managers if they use GLX!) is running while the PC is sent to sleep (suspend-to-RAM), they will continuously use more RAM after resuming.
In case of XFWM (the window manager of the XFCE desktop), which of course tends to run for a long time, it often continues to eat memory until there is none left and the machine crashes or becomes unusable, see XFWM4 Memory Leak (#825) · Issues · Xfce / xfwm4 · GitLab
One person reported that XFWM used 50GB of memory.
Someone else was able to trace it to glXSwapBuffers() (and everyone in that bug report uses nvidia binary drivers).
Note: With XFWM it only happens if the compositor is enabled.

Anyway, I was able to reproduce this problem independently of the window manager’s compositor, with a very simple example application: Minimal OpenGL on X11 example, to reproduce OpenGL driver bug · GitHub

If I start it and run watch "ps aux | grep glxsimple" in another terminal, I can see that the memory usage is constant, at first.

But after suspending my PC and resuming it (while glxsimple is running), the main memory usage of glxsimple continuously grows by about 2MB every four seconds.

It can of course also be reproduced with glxgears, but that takes a bit longer than glxsimple until its memory usage stops growing after starting it (before suspending the machine, after resuming it grows anyway, maybe even faster than with glxsimple).

I’m using XUbuntu 24.04 with nvidia-driver-565 version 565.77-0ubuntu0~gpu24.04.1.
In the XFCE issue people report that it also happens with other driver versions like 565.57, 565.77 570.86.16, 570.124.04 and the currently latest release, 570.133.07.
Reports about 550 versions were mixed: Some said downgrading to a 550 version of the driver fixed the issue for them, others said that they had this problem even with 550.

Update: This also affects Vulkan, vkgears shows the same behavior as glxgears or glxsimple (growing memory usage after suspend+resume)

3 Likes

By the way, here’s the bug report log (which was nontrivial to obtain because running nvidia-bug-report.sh with driver version 565.77 causes a kernel panic on my machine): nvidia-bug-report.log.gz (219.0 KB)

I just updated the driver to 570.124.04-0ubuntu0~gpu24.04.1.
At least nvidia-bug-report.sh doesn’t cause kernel panics anymore with this driver, but the memory usage problem persists. Here’s the log from the 570 driver: nvidia-bug-report.log.gz (402.9 KB)

2 Likes

Just want to mention , I have the same issue.

Likewise I have the same issue on 570.133.07

Hi All,
Thanks for reporting issue, I have filed a bug [5204322] internally for tracking purpose.
It would be good to know if it’s indeed a regression and last working driver.

The problem did NOT exist in 550.144.03. It DOES exist in 565.57.01. I don’t know about anything in between.

1 Like

There was a report that said it existed in 550

That may be true. I don’t recall having the problem in fedora 40 with that driver version.

We were able to reproduce issue internally, shall update further when I have engineering update.

4 Likes

I’m fairly certain we’re seeing the same or similar issue on Windows with OpenGL - the newer drivers are leaking memory over time, without changes to our code.

I tracked down the problem that causes memory growth after a VT switch (or suspend & resume, which does a VT switch behind the scenes) and it should be fixed in a future release. The problem is definitely Linux-specific though, so any problem you’re seeing on Windows is unrelated.

You can work around the problem by enabling NVreg_PreserveVideoMemoryAllocations in the nvidia module parameters and enabling the relevant systemd units as described here: Chapter 21. Configuring Power Management Support

3 Likes

I enabled NVreg_PreserveVideoMemoryAllocations but now I’m getting a black screen after resuming from a suspend. Sometimes it takes me to the login screen after a minute but othertimes it just stays and I have to reboot.

I’m also getting random blackscreens if I don’t touch my keyboard for a minute or so. During these random blackscreens I can still hear audio and my mic works (people can hear me in discord). The power saving settings are definitely set to never turn the screen off. Using the keyboard brings it back.

I definitely have enough in /tmp and have tried setting it to other paths too, Any ideas?

Weird. Can you please generate and attach an nvidia-bug-report.log.gz after a failed resume? You might need to SSH into the system to do it remotely if the screen isn’t working.

Which desktop environment are you using and is it Xorg or Wayland?

I have been seeing the same issue suspend/resume while using Fedora 42, xorg-x11-drv-nvidia-570.133.07-1.fc42.x86_64, kernel-6.14.2-300.fc42.x86_64, and Xorg.

Sorry for the delay. I haven’t been able to replicate the full failed resume yet. However, I am getting the black screen before login and random blackscreens after login, every 30 seconds. The only the thing that solves it is running xset -dpms on each resume.

I am using:

  • Ubuntu 24.04.2 LTS
  • Xorg
  • Nvidia Driver Version: 570.124.04
  • CUDA Version: 12.8
  • Kernel Version: 6.11.0-24-generic x86_64

Here is the log after a resume:
nvidia-bug-report.log.gz (402.9 KB)

Interesting find that xset -dpms fixes it. In your bug report log, xset -q says that the DPMS standby timeout is 1200 seconds and the poweroff timeout is 3600, so it shouldn’t be shutting off after 30 seconds. To confirm, is that 30 seconds of the system being idle, or does it turn off the screen even if you’re actively moving the mouse or pressing keys?

Processing: nvidia-bug-report.log.gz…

30 seconds of the system being idle. If I’m moving the mouse or pressing keys it doesn’t do this. I wonder if it could be an unrelated bug though

The latest driver (570.153.02) seems to fix this issue \o/

I managed to reproduce this by switching from Arch Linux to Fedora. This seems really unlikely to be a graphics driver bug but I’ll take a look anyway.

On my system, this call is coming from somewhere in the JavaScript code in gnome-shell. I think this is going to have to be debugged by someone more familiar with the internals of GNOME. I see in your bug report log it says that you’re running xfce4 – if you check ps aux do you see any GNOME components running?

(gdb) bt
#0  DPMSForceLevel (dpy=0x5619d3a47ef0, level=level@entry=3) at /usr/src/debug/libXext-1.3.6-3.fc42.x86_64/src/DPMS.c:240
#1  0x00007f24400dc114 in meta_monitor_manager_xrandr_set_power_save_mode (manager=0x5619d37eb4d0, mode=<optimized out>) at ../src/backends/x11/meta-monitor-manager-xrandr.c:189
#2  meta_monitor_manager_xrandr_set_power_save_mode (manager=0x5619d37eb4d0, mode=META_POWER_SAVE_OFF) at ../src/backends/x11/meta-monitor-manager-xrandr.c:165
#3  0x00007f2440062d1f in power_save_mode_changed (manager=0x5619d37eb4d0, pspec=<optimized out>, user_data=<optimized out>) at ../src/backends/meta-monitor-manager.c:466
#4  0x00007f2440c22b1a in g_closure_invoke (closure=0x5619d3a6ba00, return_value=0x0, n_param_values=2, param_values=0x7ffe351c6b20, invocation_hint=0x7ffe351c6a70) at ../gobject/gclosure.c:833
#5  0x00007f2440c40aba in signal_emit_unlocked_R (node=node@entry=0x7ffe351c6c30, detail=detail@entry=611, instance=instance@entry=0x5619d3a31d20, emission_return=emission_return@entry=0x0, instance_and_params=instance_and_params@entry=0x7ffe351c6b20) at ../gobject/gsignal.c:3902
#6  0x00007f2440c42adc in signal_emit_valist_unlocked (instance=instance@entry=0x5619d3a31d20, signal_id=signal_id@entry=1, detail=detail@entry=611, var_args=var_args@entry=0x7ffe351c6d90) at ../gobject/gsignal.c:3534
#7  0x00007f2440c42d58 in g_signal_emit_valist (instance=0x5619d3a31d20, signal_id=1, detail=611, var_args=var_args@entry=0x7ffe351c6d90) at ../gobject/gsignal.c:3277
#8  0x00007f2440c42e13 in g_signal_emit (instance=<optimized out>, signal_id=<optimized out>, detail=<optimized out>) at ../gobject/gsignal.c:3597
#9  0x00007f2440c2ebb4 in g_object_dispatch_properties_changed (object=0x5619d3a31d20, n_pspecs=<optimized out>, pspecs=<optimized out>) at ../gobject/gobject.c:1851
#10 0x00007f2440c232fa in g_object_notify_queue_thaw (object=0x5619d3a31d20, nqueue=<optimized out>, take_ref=0) at ../gobject/gobject.c:761
#11 0x00007f2440c36b16 in g_object_setv (object=0x5619d3a31d20, n_properties=<optimized out>, names=<optimized out>, values=<optimized out>) at ../gobject/gobject.c:3110
#12 g_object_setv (object=0x5619d3a31d20, n_properties=<optimized out>, names=<optimized out>, values=<optimized out>) at ../gobject/gobject.c:3077
#13 0x00007f2440c36d41 in g_object_set_property (object=object@entry=0x5619d3a31d20, property_name=<optimized out>, value=value@entry=0x7ffe351c6ff0) at ../gobject/gobject.c:3406
#14 0x00007f24400278c1 in _meta_dbus_display_config_skeleton_handle_set_property (connection=<optimized out>, sender=<optimized out>, object_path=object_path@entry=0x7f241c182840 "/org/gnome/Mutter/DisplayConfig", interface_name=interface_name@entry=0x7f24401fd290 "org.gnome.Mutter.DisplayConfig", property_name=property_name@entry=0x5619d60de10f "PowerSaveMode", variant=variant@entry=0x5619d48234b0, 
    error=0x7ffe351c7080, user_data=0x5619d3a31d20) at src/meta-dbus-display-config.c:4654
#15 0x00007f244070b22b in invoke_set_property_in_idle_cb (_data=_data@entry=0x7f241c12a050) at ../gio/gdbusconnection.c:4807
#16 0x00007f2440b0e76d in g_idle_dispatch (source=0x7f241c386980, callback=0x7f244070b180 <invoke_set_property_in_idle_cb>, user_data=0x7f241c12a050) at ../glib/gmain.c:6284
#17 0x00007f2440b08040 in g_main_dispatch (context=0x5619d3663d40) at ../glib/gmain.c:3398
#18 g_main_context_dispatch_unlocked (context=0x5619d3663d40) at ../glib/gmain.c:4249
#19 0x00007f2440b11128 in g_main_context_iterate_unlocked (context=0x5619d3663d40, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4314
#20 0x00007f2440b113cf in g_main_loop_run (loop=0x5619d40f0cb0) at ../glib/gmain.c:4516
#21 0x00007f24400a415a in meta_context_run_main_loop (context=<optimized out>, error=0x7ffe351c7640) at ../src/core/meta-context.c:576
#22 0x00007f243f431056 in ffi_call_unix64 () at ../src/x86/unix64.S:104
#23 0x00007f243f42cd08 in ffi_call_int (cif=cif@entry=0x5619d415e8a0, fn=fn@entry=0x7f24400a40f0 <meta_context_run_main_loop>, rvalue=<optimized out>, rvalue@entry=0x7ffe351c7488, avalue=avalue@entry=0x5619d41da930, closure=closure@entry=0x0) at ../src/x86/ffi64.c:673
#24 0x00007f243f42f70e in ffi_call (cif=0x5619d415e8a0, fn=0x7f24400a40f0 <meta_context_run_main_loop>, rvalue=0x7ffe351c7488, avalue=0x5619d41da930) at ../src/x86/ffi64.c:710
#25 0x00007f24404c6b14 in Gjs::Function::invoke (this=0x5619d415e880, context=0x5619d368d270, args=..., this_obj=..., r_value=<optimized out>) at ../gi/function.cpp:1050
#26 0x00007f24404c744f in Gjs::Function::call (context=0x5619d368d270, js_argc=<optimized out>, vp=<optimized out>) at /usr/include/mozjs-128/js/RootingAPI.h:616
#27 0x00007f243de26972 in CallJSNative (cx=0x5619d368d270, native=0x7f24404c7380 <Gjs::Function::call(JSContext*, unsigned int, JS::Value*)>, reason=js::CallReason::Call, args=...) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:481
#28 js::InternalCallOrConstruct (cx=0x5619d368d270, args=..., construct=<optimized out>, reason=js::CallReason::Call) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:561
#29 0x00007f243de35ff1 in InternalCall (cx=0x5619d368d270, args=..., reason=<optimized out>) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:642
#30 js::CallFromStack (cx=0x5619d368d270, args=..., reason=<optimized out>) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:647
#31 js::Interpret (cx=0x5619d368d270, state=...) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:3190
#32 0x00007f243de260fc in MaybeEnterInterpreterTrampoline (cx=0x5619d3a47ef0, state=...) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:395
#33 js::RunScript (cx=0x5619d3a47ef0, state=...) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:453
#34 0x00007f243de26890 in js::InternalCallOrConstruct (cx=0x5619d368d270, args=..., construct=js::NO_CONSTRUCT, reason=<optimized out>) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:607
#35 0x00007f243de26e6c in InternalCall (cx=0x5619d3a47ef0, args=..., reason=js::CallReason::Call) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:642
#36 js::Call (cx=0x5619d3a47ef0, fval=..., thisv=..., args=..., rval=..., reason=js::CallReason::Call) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/Interpreter.cpp:674
#37 0x00007f243df010d2 in JS::Call (cx=0x5619d368d270, thisv=..., fval=..., args=..., rval=...) at /usr/src/debug/mozjs128-128.8.1-1.fc42.x86_64/js/src/vm/CallAndConstruct.cpp:119
#38 0x00007f2440509552 in JS::Call (cx=<optimized out>, thisv=..., funObj=..., args=..., rval=...) at /usr/include/mozjs-128/js/RootingAPI.h:1229
#39 GjsContextPrivate::run_main_loop_hook (this=0x5619d36846e0) at ../gjs/context.cpp:1399
#40 0x00007f2440511baf in GjsContextPrivate::eval_module (this=0x5619d36846e0, identifier=0x5619d40b65f0 "resource:///org/gnome/shell/ui/init.js", exit_status_p=0x7ffe351c802b "?\001", error=0x7ffe351c8030) at ../gjs/context.cpp:1516
#41 gjs_context_eval_module (js_context=<optimized out>, identifier=0x5619d40b65f0 "resource:///org/gnome/shell/ui/init.js", exit_code=0x7ffe351c802b "?\001", error=0x7ffe351c8030) at ../gjs/context.cpp:1295
#42 0x00007f2440511de1 in gjs_context_eval_module_file (js_context=js_context@entry=0x5619d3684860, filename=filename@entry=0x5619cefdb100 "resource:///org/gnome/shell/ui/init.js", exit_status_p=exit_status_p@entry=0x7ffe351c802b "?\001", error=error@entry=0x7ffe351c8030) at ../gjs/context.cpp:1600
#43 0x00005619cefd7303 in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.c:691