High CPU usage on xorg when the external monitor is plugged in

Hi,

Look what i described about this issue…

What i did:
a) backup libglxserver_nvidia.so.515.65.01 like libglxserver_nvidia.so.515.65.01.save
b) I renamed libglxserver_nvidia.so.515.65.01 to nvidia_libglxserver.so.515.65.01
c) create a symbolic link point to nvidia_libglxserver.so.515.65.01 called nvidia_libglxserver.so
d) This avoid gnome-shell running into nvidia vram while primary_gpu is set
e) if you would like can create udev rule that i posted before
f) reboot machine

The problem i cant launch on deddicated graphics yet …
Now using xubuntu 22.04.1 recent nvidia driver 5.15.65.01

Thank you

luiz

Looks like I have local repro on below configuration setup -

Acer Predator PH315-53 + Ubuntu 20.04 + kernel 5.8.0-50-generic + Nvidia GeForce GTX 1650 Ti + Driver 515.57 + 1 Display with HDMI connection

I can see CPU utilization consumed by xorg process is around 23.5% with above configuration setup and power level as P8 in Auto Preferred Mode ---- REPRO
However when I changed it to Prefer Maximum Performance Mode, CPU utilization drops to ~3% and power level at P3 mode.

Later I removed external display and can see CPU utilization as 0.7% and power level as P3 in Prefer Maximum Performance Mode.
With no external display connected on Auto mode, CPU utilization consumed by xorg application is 0.7% and power level as P8.

Please confirm once from your end if it can be considered as repro so that we can debug in the same direction for root cause.

I have also filed a bug 3776073 internally for tracking purpose.

@amrits I can confirm that the behaviour you describe matches what I see on a Dell Inspiron 7610 with RTX 3060, latest nvidia drivers (also prior to that) on Fedora 36. In the case of this device, the Nvida GPU is the card with exclusive control over the USB-C (Thunderbolt) output.

Note that the Dell Inspiron 7510 and Dell Vostro 7510 may also ship in configurations including an Nvidia GPU, and the Inspiron 7610 ships also with, e.g. RTX 3050. These devices, I believe, are all wired up identically, so would all be affected.

This issue can be amplified by limiting the Nvidia GPU performance through nvidia-smi (simply assign minimum clocks).

The behaviour you are observing, essentially, smells like massive polling overhead for the sake of synchronizing “something”; in a modern world, waiting on a completion event would really help.

@shoffmeister
Thanks for the confirmation.
We will debug issue and keep you updated on it.

1 Like

Thanks, I hope that will fix soon, at the meant time my laptop stuck on Windows until this thing is going to fix already

Can confirm this happening on my legion 5 laptop on Manjaro Linux (same on any linux)
So high cpu usage and choppy performance happens when in Hybrid mode and Nvidia Adaptive profile. Switching to High performance mode solves render perfomance and cpu usage, but at the cost of the battery. Using Nvidia only mode again solves the issue with CPU usage and performance but comes with it’s own set of problems ( power consumption, etc)

1 Like

When switched to Nvidia Performance mode, I noticed the power consumption constantly peak at around 30watt was on idle state just moving cursor not rendering stuff.

I have been poking at the Nvidia GPU in my laptop for a bit - some observations:

  • nvidia-smi is a very very useful tool
  • GitHub - XuehaiPan/nvitop: An interactive NVIDIA-GPU process viewer, the one-stop solution for GPU process management. is cute
  • nvidia-smi allows to set a number of parameters which do have direct, controlled impact on the performance of the GPU itself; “performance mode” and what not are just names for a set of parameters which, effectively, translate into clocking limits (which can be more transparently expressed through nvidia-smi)
  • forcing the GPU into real low power mode (minimum clocking for the processor and the clock) tends to expose design challenges much better than anything else; for some experiments I was clocking to 100/100 (which is invalid data, but nvidia-smi seems to grasp the intent and floor the parameters at the minimum acceptable values)

Given that,

  • high performance shoots the processor and the memory clock to the sky → high power consumption on the Nvidia package → plenty of energy flowing through the system → thermal load on Nvidia side → Nvidia fan(s) spin up
  • battery saving clocks everything down Nvidia side → CPU driver keeps polling desperately waiting for the Nvidia infra to complete → CPU load → thermal load on CPU side → system fan(s) spin up

Solution: No polling, please. Event completion, please.

(Sorry, I don’t know which event the Nvidia infra is polling for, but I am highly confident that a switch from polling to event trigger processing really could make a difference, in case this is technically feasible. I suspect the Nvidia driver is polling for memory copy completion, because in my case, that’s the only thing the driver is tasked with)

1 Like

Hello,
Same problem for me on Lenovo Legion 5 pro RTX3060 Ubuntu.

I have done ps -a | grep Xorg | awk '{print $1}' | xargs kill 9
So bad !
How can I cancel this bad command please ??

1 Like

It’s not an Nvidia driver problem. I have the same issue on the i5-1135G7 Frame.work laptop. It has only intel iGPU. It’s Xorg’s problem.

I am using HDMI to VGA converter, so I think that this is an issue with Xorg and some monitor controllers.

The true solution is to use Wayland. I went back to X11 because KDE broke Wayland some time ago.

Really? I have an old Intel processor it runs just fine, on my gaming laptop when it only uses an AMD igpu the xorg will be also fine. I had enabled Wayland on Nvidia it was terrible, everything went crazy most of the application won’t run until I reverted back to Xorg.

So it is probably some issue with Xorg and a specific set of monitor + GPU + driver.

Also, I went to Wayland with Intel because Iris XE drivers were disgustingly bad on X11. They are kinda fine now, but I still have weird issues like that. And they were working great on Wayland from the beginning.

So in other words, you really have nothing to add.

“Same issue” doesn’t mean “same root cause”. I can write a for loop that just burns CPU cycles, what’s your point.

People above were blaming Nvidia drivers for Xorg burning CPU with some monitors but in my case I also have Xorg process burning in the same manner (~30% usage with some monitors) while my laptop never even seen Nvidia drivers, so it must be an Xorg problem not Nvidia.

I have clearly demonstrated (flamechart) that seriously undue CPU consumption under effective idle operations, in a specific scenario / environment, can be attributed to code being executed inside the Nvidia driver.

Please do advise, based on some detailed technical analysis which I am confident you will provide, where specifically to look for the alternative root cause if not in the Nvidia driver?

Hi @amrits!
Any news on this? Have you investigated anything? Maybe you could suggest a workaround for this?

PS. I’ve tried latest 520.x driver with latest X amdgpu driver compiled from master. Still high usage

Sorry for digging this topic up.

I just found out that on xorg issue is still there but to my surprise it’s no longer happening on Wayland.
Running Fedora 37 with driver 520.56.06 on Lenovo Legion 5, amd + rtx 2060 conencted to external 1440p monitor over HDMI

We have been able to root caused the issue and checking internally with the fixes.
Shall communicate further once fix is integrated and released publicly.

6 Likes

Same problem here. ASUS ROG Strix G15 G513RM. AMD iGPU and NVIDIA dGPU. When using external monitor via HDMI cable with Reverse PRIME configuration, X process eats 25-30% CPU. When I use only one monitor without Reverse PRIME configuration (internal with iGPU or external with dGPU) everything is fine. I want to be able to use both monitors utilizing Reverse PRIME technology without X eating third of my CPU. I hope the fix that @amrits is talking about we’ll be released soon and solve this X CPU utilization problem.

I have tested the latest beta driver(525.53). To my great regret, i found that it does not contain a bug-fix to this particular issue.
Hadn’t really expected it because amrits, has yet to “communicate further” on the matter.
But nevertheless i tried, and shared my experience here.
Hope that the fix is integrated quickly, and of course released.
SETUP: ASUS TUF A17 (FA706IU), AMD 4800H, Geforce GTX 1660Ti

3 Likes