Nvidia runtime D3 power management issues

I am using a Laptop with a Quatro RTX 3000 card. External screens are attached to the intel card and the RTX usually has nothing to do. I enabled PCI-Express runtime D3 power management and at the first glance it seems to be fine:

  • card suspends, since there is no activity (No running processes found)
  • if I start bumblebee I can use optirun to explicitly start an application on the nvidia card (e.g. steam games)

However, some applications (mainly video players) seem to wake the card and keep it enabled. This happens with mpv and plexmediaplayer, however VLC seems not to show this behaviour.

I can easily reproduce it by just starting mpv or plexmediaplayer. The card immediately wakes up, but does not show any running processes and idles in P8 (3W) mode until the player process exits.

I guess, it has something to do with probing for available hw-acceleration options, however if the application is not using the card for decoding, it should not stay awake, right?

Is that a know issue? Anyone else observing this?

Edit:
I figured out that if one (or both) video players are running (not even playing a video), nvidia-smi shows me 3MiB of the video memory used.

  • Why is there a small amount of memory allocated?
  • Shouldn’t the card go to sleep anyways? According to the D3 documentation, it should power the card down if there is enough system memory available to store the video memory (If I understand that correctly)

Any thoughts?

Am I the only one? :O

Using runtime pm with bumblebee isn’t actually the way it’s meant to be set up, any reason you’re not using render offload?
Though this might not necessarily the issue in your case. Plex media is using mpv and mpv is using libcuda, loading the nvidia-uvm module. So this might be the reason the gpu not suspending.
Though you should see if setting up render offload mitigates it.

I wish I could use render offload. I am under sway-wm, which is heavily incompatible with nvidia (or other way around…). If I try to use render-offload (uninstalling bumblebee, early loading of nvidia_drm), I am ending up in the following situation:

nvidia-smi shows X always as a running process, card never goes to sleep. Trying to run an application on the nvidia card results in a weird Xorg error message (cannot remember exactly which one / could set it up again if that helps). If I do the same in an i3 session instead of sway, it works. So I guess it is an sway/wlroots issue.

Is that known to not work? Or am I missing something and my life could be way better??

Awesome information! Thanks a lot :)

Is it just loading the module and not using the card? Or is it secretly using the card w/o telling me?

Ok, since you’re running a Wayland and not an Xorg session, I guess you currently can’t use render offload.
In your case, mpv is not using the gpu, it just loads the lib. You could try blacklisting the nvidia-uvm module, maybe it helps.

Yeah, I thought so. That really sucks :(

Do you know if there is any effort being done currently which changes that? Or is that going to be a ‘does not work by design’ issue for the next years?

Just for my understanding. It does work in a native X-session, so shouldn’t it work in an Xwayland session as well? Or why is this such a big issue?

Xwayland is a Wayland-client so it doesn’t use the X driver. IIRC, it was expected to work by end of 2019, at least so the Xwayland devs were speaking. Didn’t hear anything of it afterwards, though, so I don’t know about the actual state.

This is an about one-year old blog post about this issue:

Is this what might be solving the problem??

https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-GL-VLK-XWayland

Maybe, at least partially. Those changes at least make Xwayland nvidia gpu accel work on single gpu systems. I guess making use of that on hybrid graphics systems will require more work in the compositors (sway in your case).

I assumed that. Thank you.

Sway+nvidia, never gonna happen I guess… :(

Unfortunately, this did not help. I have nvidia_uvm in my blacklist.conf, but the module seems to get loaded anyways :(