High CPU usage on xorg when the external monitor is plugged in

As far as I can tell, this depends on the specific hardware video output path (configuration) of the device.

Case in point for my Dell Inspiron 16 plus (7610), a Tiger Lake 11800H + Nvidia 3060 RTX device which has one HDMI output and one USB-C port (which serves graphics data via Thunderbolt Alternate Mode or DisplayPort Alternate Mode).

The HDMI port is exclusively served by the Intel GPU, and the Intel GPU itself is unable to send data via USB-C

The Nvidia GPU is the only GPU that is able to generate any output through the USB-C port (and, IIRC, the Nvidia GPU is unable to output to the built-in screen of the notebook).

(I believe I am describing a MUX design here, which is the “cheap” version of doing hardware)

In that constellation, with the exact same external 4K screen attached to the device on

  • HDMI == only Intel GPU works; all good
  • USB-C (using an Alternate Mode) == Intel GPU serves built-in display; Nvidia GPU serves external screen

In the latter configuration, you will see all this massive CPU load on the Xorg process; and all this load is coming from the in-process Nvidia driver, with a massive number of calls to the LInux VDSO to get the current time.

This posting of mine shows more technical detail: Nvidia X11 driver busy-polls kernel on clock_gettime in a tight loop - this also has an nvidia-smicall to expose the problem much better: It gets a whole lot worse, i.e. much worse than the badness in the “normal” case, if you downclock the Nvidia GPU.

I believe one use case may be handled rather badly by the existing Nvidia driver: If the Nvidia GPU is connected to the output (“connector”) but does not do the rendering itself.

On a hybrid notebook that is not really abnormal - personally, I want to use the Nvidia cores not for graphics, but for compute; i’s just a (unfortunate) fact of life that the notebook vendor decided to wire up the Nvidia GPU to be the one and only device producing output on USB-C.

Is this a case of poorly configured /unconfigured systems encountering a systemd /dbus bug continually knocking services out including GDM. ? Causing it to continually restart.

In my case, with the Dell Inspiron 7610, this is not the case - I am linking to a Linux kernel perf trace (rendered through, effectively, a Chrome flamegraph) which demonstrates very clearly that

  • there are not errors
  • the nvidia driver keeps spamming the VDSO for getting time - apparently in an attempt to sync up something (it’s hard to see the details with a totally stripped Nvidia X driver)

For me, your info just confirms that this a SystemD issue.
I believe they call this behaviour progress not a bug.
Multiple patches that down stream Distro’s havent handled well.
It’s a mess. The impression I get is SystemD /Kernel Dev’s are playing Simon Says and everyone is scrambling to catch up.

I believe this also presents on some systems with:
“[drm:nv_drm_master_set [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership”

“mtd device must be supplied (device name is empty)”

In GDM causing high cpu usage (GDM continually being kicked off dbus), in kDE causing network dropouts and freezing and preventing some systems not to boot.

I could be completely wrong.

This thread conflates a number of issues:

  • The starting post had very obvious problems visible in the X log (see their Nvidia bug report attachment).
  • Then people jumped in and said “my CPU is high”; I suspect that for many of these people the root cause is not found in the X log, but elsewhere.

I did notice this thread before posting my Nvidia X11 driver busy-polls kernel on clock_gettime in a tight loop question - just to start with a cleaner plate.

Alas, my posting never received any attention from Nvidia, although I suspect this to be a rather structural and generic challenge in the Nvidia driver on X, when it comes to coordinating “something” across two GPUs (Intel and Nvidia, in my case).

1 Like

Lenovo T15 Gen 1, problem is highly visible and it hurts a lot. Arch Linux, no Gnome and other garbage, just XOrg and i3wm. I believe that the laptop’s monitor is connected to the internal eDP-1 output while all other outputs are connected to nVidia GPU. The problem is equally reproducible with the HDMI port or HDMI cable plugged into a USB-C hub (appears as DP port in this case).

The configuration of XOrg is done in such a way that no app uses GPU directly without invoking it specifically with prime-run. E.g. nvtop shows no apps using the GPU except XOrg itself.

Once I activate the second monitor with XRandr, the CPU usage by XOrg process spikes to 100% and the rendering of even primitive things becomes noticeably slower. Disabling the 2nd monitor gets things back to normal. Enabling maximum GPU performance with nVidia settings app does relieve the pressure a bit - XOrg process CPU usage goes down to ~25% and things become more live. But at the cost of power consumption, the fans start to work.

And I did try to strace XOrg process in the past and noticed the unusually high number of calls to gettime (or something similar) syscall, many hundreds per second. Yet, nothing gets written to XOrg log file or any other logs indicating a frequent action or a process starting and dying instantly.

FWIW, strace will only show the problem, IIRC, if the VDSO for the clock_gettime is not used.

If you see this via strace, your CPU load will be even higher compared to systems where the VDSO is used.

Wonder if this issue has anything to do with it:

No, checked with dbus-monitor, no traffic. dbus daemon is also sitting nearly idle.

No the systemd service and the X11 driver are two totally distinct things.

Hi,

I would like to share my “workaround”

lets go:
my bios is mshybrid (nvidia optimus)
prime-select on-demand

I inserted into 70-uaccess.rules (udev), affet this line # DRI video devices

  • SUBSYSTEM==“drm”, KERNEL==“fb0”, TAG+=“uaccess” ← avoid built-in monitor glitches when nvidia is set primary_gpu

10-nvidia.conf

  • Option ""PrimaryGPU “Yes”

It is not final solution but i believe it can contribute some with this issue …

Thank you

Luiz

1 Like

Without primary gpu yes (option), (xvidtune -show) [this command show up wrong info]

steam game champions_of_regnum suffers from use right resolution (ignore external display default resolution), even the game launched with env primeoffload or vgaswitcherooctl (vgaswitcherooctl launch %command%) :(

My build is Ubuntu 22.04.1 LTS, nvidia driver 515.65.01

Other info, when i ran, lshw -businfo -class display, i got this:
Informações do barramento Dispositivo Classe Descrição

pci@0000:01:00.0 /dev/fb0 display GA104M [GeForce RTX 3080 Mobile / Max-Q 8GB/16GB]
pci@0000:00:02.0 /dev/fb0 display TigerLake-H GT1 [UHD Graphics]

Both using same fb0 dev

and, after running, hwinfo --gfxcard, i got this:
Primary display adapter: #15 (nvidia)
Even if bios and prime-select are set mshybrid&on-demand

Hi,

Look what i described about this issue…

What i did:
a) backup libglxserver_nvidia.so.515.65.01 like libglxserver_nvidia.so.515.65.01.save
b) I renamed libglxserver_nvidia.so.515.65.01 to nvidia_libglxserver.so.515.65.01
c) create a symbolic link point to nvidia_libglxserver.so.515.65.01 called nvidia_libglxserver.so
d) This avoid gnome-shell running into nvidia vram while primary_gpu is set
e) if you would like can create udev rule that i posted before
f) reboot machine

The problem i cant launch on deddicated graphics yet …
Now using xubuntu 22.04.1 recent nvidia driver 5.15.65.01

Thank you

luiz

Looks like I have local repro on below configuration setup -

Acer Predator PH315-53 + Ubuntu 20.04 + kernel 5.8.0-50-generic + Nvidia GeForce GTX 1650 Ti + Driver 515.57 + 1 Display with HDMI connection

I can see CPU utilization consumed by xorg process is around 23.5% with above configuration setup and power level as P8 in Auto Preferred Mode ---- REPRO
However when I changed it to Prefer Maximum Performance Mode, CPU utilization drops to ~3% and power level at P3 mode.

Later I removed external display and can see CPU utilization as 0.7% and power level as P3 in Prefer Maximum Performance Mode.
With no external display connected on Auto mode, CPU utilization consumed by xorg application is 0.7% and power level as P8.

Please confirm once from your end if it can be considered as repro so that we can debug in the same direction for root cause.

I have also filed a bug 3776073 internally for tracking purpose.

@amrits I can confirm that the behaviour you describe matches what I see on a Dell Inspiron 7610 with RTX 3060, latest nvidia drivers (also prior to that) on Fedora 36. In the case of this device, the Nvida GPU is the card with exclusive control over the USB-C (Thunderbolt) output.

Note that the Dell Inspiron 7510 and Dell Vostro 7510 may also ship in configurations including an Nvidia GPU, and the Inspiron 7610 ships also with, e.g. RTX 3050. These devices, I believe, are all wired up identically, so would all be affected.

This issue can be amplified by limiting the Nvidia GPU performance through nvidia-smi (simply assign minimum clocks).

The behaviour you are observing, essentially, smells like massive polling overhead for the sake of synchronizing “something”; in a modern world, waiting on a completion event would really help.

@shoffmeister
Thanks for the confirmation.
We will debug issue and keep you updated on it.

1 Like

Thanks, I hope that will fix soon, at the meant time my laptop stuck on Windows until this thing is going to fix already

Can confirm this happening on my legion 5 laptop on Manjaro Linux (same on any linux)
So high cpu usage and choppy performance happens when in Hybrid mode and Nvidia Adaptive profile. Switching to High performance mode solves render perfomance and cpu usage, but at the cost of the battery. Using Nvidia only mode again solves the issue with CPU usage and performance but comes with it’s own set of problems ( power consumption, etc)

1 Like

When switched to Nvidia Performance mode, I noticed the power consumption constantly peak at around 30watt was on idle state just moving cursor not rendering stuff.