Nvidia X11 driver busy-polls kernel on clock_gettime in a tight loop

On my system, the Nvidia 515 release X11 driver keeps busy polling the Linux kernel for clock_gettime (through libc) in a tight, busy loop:

On an otherwise idle system, this consumes up to 40%

This problem does not seem to be new; it has been reported in High CPU usage on xorg when the external monitor is plugged in before, but the follow-up postings seem to have watered down the very nice technical starting posting. I am therefore creating this very specific fresh topic.

What you see above is a nice rendition of the Xorg process created by GitHub - janestreet/magic-trace: magic-trace collects and displays high-resolution traces of what a process is doing - this is simply perf, but allowing for a friendlier presentation in the form of a flamegraph (sudo magic-trace attach -pid $PID_OF_XORG - the output can then be rendered in multiple ways, I have a strictly local server running for this, simple to do)

This nice rendition shows that on my idle Tiger Lake-H 8 core system, almost all of the CPU is consumed by Xorg, and there by what appears to be a tight loop inside the (closed source) nvidia_drv, the Nividia X11 driver module.

This RTX 3060 Optimus notebook is running Fedora 36, latest kernel, latest Mesa, latest KDE, latest X, on top of

  • Intel GPU serves (only) the internal display (and HDMI)
  • Nvidia GPU serves (only) the USB-C output (via DisplayPort Alternate Mode to a DisplayPort)
  • Intel is the primary GPU, Nvidia is offloading

Notebook screen: 3072x1920 @ 60.14 Hz
External screen: 3840 x 12160 @ 60.00 Hz (no G-Sync)

This problems gets reduced a little bit by forcing the GPU to prefer maximum performance; this clocks up everything although I seem to be only consuming bandwidth for memory transfer to the Nvidia controlled connector / crtc. And clocking up everything creates loads of heat and noise.

A good way to demonstrate this problem on my notebook is to run

sudo nvidia-smi --reset-memory-clocks
sudo nvidia-smi --lock-memory-clocks=100,100

to force the GPU into a lower power state (don’t worry about the 100,100 - apparently nvidia-smi will auto-correct that). Things do get a little but laggy, but suddenly CPU consumption is even higher, 50+%, and many many more clock_gettime calls (in a busy loop).

So looking at this from the outside, there is a strong correlation between low memory speed on the GPU and (very high) CPU utilization from busy polling the Linux kernel clock_gettime.

But why, and how can this be stopped please?

I only want the Nvidia driver to show that framebuffer content it was handed (crtc → port); in PRIME offload, it doesn’t produce anything on top of that, unless I tell it to do so.

Now that the driver code is open source maybe you could compile it with debug info enabled and re-run your experiment? It would probably allow to pin point the root cause of this issue. NVIDIA doesn’t seem to be very interested to investigate this one…

The Nvidia graphics stack for X11 comprises of two components (gross oversimplification): The Linux kernel graphics interface (nvidia_drm / nividia_modesetting, opensourced at GitHub - NVIDIA/open-gpu-kernel-modules: NVIDIA Linux open GPU kernel module source) and the X11 graphics driver itself. The X11 graphics driver, “nvidia”, remains closed (and is huge):

lsmod | grep nvidia

nvidia_drm             73728  0
nvidia_modeset       1146880  1 nvidia_drm
nvidia_uvm           1286144  0
nvidia              40849408  74 nvidia_uvm,nvidia_modeset

The second column is the module size in bytes. All that polling and calling of clock_gettime is made in loops, inside nvidia (the X11 component).

I believe Nvidia have no plans to open up the X11 part (and reviewing something that ends up being a 40 MB binary for opensourcing is not exactly something that would offer terrific business value)

In all honesty, I do not have the energy to trace through compiled code which has been stripped of almost all supportive metadata (ELF symbols, debug symbols, …) - and that lack of metadata also implies that any tooling one might be able to employ has its hands tied behind its back.

The only recourse is for someone with knowledge of the overall driver architecture and with access to the source code to root cause this. I fear the end result might be a simple “works as designed” (where the design constraint might stem from non-technical issues).

I managed to reduce to CPU usage by forcing to use only nvidia GPU, following the instructions at:

The X11 process changed from 20%-40% (constantly) to 0%, and the nvidia GPU is OK.

Great analysis! Obviously, the driver is polling for something in a busy loop, checking the clock on each iteration for a timeout. The timeout it is waiting for probably depends on the clock speed, that is why the CPU consumption is higher when the GPU clock speed is low? What could it be? Memory transfer between the system memory and the GPU?