Nvidia-powerd service causes microstutters

Initially posted as an open kernel modules issue: nvidia-powerd causes microstuttering · Issue #1061 · NVIDIA/open-gpu-kernel-modules · GitHub

However nvidia-powerd is userspace so it’s probably more fitting here.

Description:
nvidia-powerd causes microstutters intermittently, the frameratre line in mangohud becomes jagged for a few seconds and it visibly stutters. Happens on any laptop performance mode. Noticeable anywhere including desktop usage and games.

I first noticed it does not stutter when I reinstalled the drivers, powerd seems to not work on the first boot and it’s totally stutter-free. So powerd being off at first kernel boot is probably also a bug. I did not check specifically if it was on or off, however I was capped at ~2000mhz clock speed, same as when it is turned off manually, so it likely was.

To reproduce:
Open the vsynctester website while the monitor is using the dgpu directly, observe intermittent microstutter spikes. The higher the refresh rate, the more noticeable it is. Very stuttery at 240hz with drops down to around 220fps.

Alternatively, launch any demanding game with unlimited fps and observe bursts of microstutters. Used ARC Raiders and GTA V Enhanced to observe myself.

Specs
CPU: AMD Ryzen 7 7840HS
Lenovo Legion Slim 5 16APH8
GPU: RTX 4060

Tested on Fedora and CachyOS. The issue does not happen without nvidia-powerd, but then I’m limited to 60W.

The log is from March (the same one as on Github), though it does happen on the latest drivers as well:

nvidia-bug-report.log.gz (482.0 KB)

For those encountering the issue, there is a workaround to get full performance:

Stress your GPU and CPU then freeze the nvidia-powerd process. That way you can get the full wattage out of your GPU while removing the stutters. Can be automated with a script, but the whole process is a bit inconsistent as you freeze everything at the power caps of when nvidia-powerd was working, and they are dynamic.

You would also need to add a dbus rule so that dbus doesn’t run out of memory:

<busconfig>
  <type>system</type>
  <policy user="root">
    <allow own="nvidia.powerd.server"/>
  </policy>
  <policy context="default">
    <deny send_destination="nvidia.powerd.server"/>
  </policy>
</busconfig>

Save this as /etc/dbus-1/system.d/nvidia-dbus.conf

Also, no dynamic wattage control means that the CPU would try running as fast as it can in demanding scenarios, so you’d probably want to cap the CPU clocks to avoid overheating (e.g. via cpupower).