Nvidia X11 driver busy-polls kernel on clock_gettime in a tight loop

On my system, the Nvidia 515 release X11 driver keeps busy polling the Linux kernel for clock_gettime (through libc) in a tight, busy loop:

On an otherwise idle system, this consumes up to 40%

This problem does not seem to be new; it has been reported in High CPU usage on xorg when the external monitor is plugged in before, but the follow-up postings seem to have watered down the very nice technical starting posting. I am therefore creating this very specific fresh topic.

What you see above is a nice rendition of the Xorg process created by GitHub - janestreet/magic-trace: magic-trace collects and displays high-resolution traces of what a process is doing - this is simply perf, but allowing for a friendlier presentation in the form of a flamegraph (sudo magic-trace attach -pid $PID_OF_XORG - the output can then be rendered in multiple ways, I have a strictly local server running for this, simple to do)

This nice rendition shows that on my idle Tiger Lake-H 8 core system, almost all of the CPU is consumed by Xorg, and there by what appears to be a tight loop inside the (closed source) nvidia_drv, the Nividia X11 driver module.

This RTX 3060 Optimus notebook is running Fedora 36, latest kernel, latest Mesa, latest KDE, latest X, on top of

  • Intel GPU serves (only) the internal display (and HDMI)
  • Nvidia GPU serves (only) the USB-C output (via DisplayPort Alternate Mode to a DisplayPort)
  • Intel is the primary GPU, Nvidia is offloading

Notebook screen: 3072x1920 @ 60.14 Hz
External screen: 3840 x 12160 @ 60.00 Hz (no G-Sync)

This problems gets reduced a little bit by forcing the GPU to prefer maximum performance; this clocks up everything although I seem to be only consuming bandwidth for memory transfer to the Nvidia controlled connector / crtc. And clocking up everything creates loads of heat and noise.

A good way to demonstrate this problem on my notebook is to run

sudo nvidia-smi --reset-memory-clocks
sudo nvidia-smi --lock-memory-clocks=100,100

to force the GPU into a lower power state (don’t worry about the 100,100 - apparently nvidia-smi will auto-correct that). Things do get a little but laggy, but suddenly CPU consumption is even higher, 50+%, and many many more clock_gettime calls (in a busy loop).

So looking at this from the outside, there is a strong correlation between low memory speed on the GPU and (very high) CPU utilization from busy polling the Linux kernel clock_gettime.

But why, and how can this be stopped please?

I only want the Nvidia driver to show that framebuffer content it was handed (crtc → port); in PRIME offload, it doesn’t produce anything on top of that, unless I tell it to do so.