What i did:
a) backup libglxserver_nvidia.so.515.65.01 like libglxserver_nvidia.so.515.65.01.save
b) I renamed libglxserver_nvidia.so.515.65.01 to nvidia_libglxserver.so.515.65.01
c) create a symbolic link point to nvidia_libglxserver.so.515.65.01 called nvidia_libglxserver.so
d) This avoid gnome-shell running into nvidia vram while primary_gpu is set
e) if you would like can create udev rule that i posted before
f) reboot machine
The problem i cant launch on deddicated graphics yet …
Now using xubuntu 22.04.1 recent nvidia driver 5.15.65.01
I can see CPU utilization consumed by xorg process is around 23.5% with above configuration setup and power level as P8 in Auto Preferred Mode ---- REPRO
However when I changed it to Prefer Maximum Performance Mode, CPU utilization drops to ~3% and power level at P3 mode.
Later I removed external display and can see CPU utilization as 0.7% and power level as P3 in Prefer Maximum Performance Mode.
With no external display connected on Auto mode, CPU utilization consumed by xorg application is 0.7% and power level as P8.
Please confirm once from your end if it can be considered as repro so that we can debug in the same direction for root cause.
I have also filed a bug 3776073 internally for tracking purpose.
@amrits I can confirm that the behaviour you describe matches what I see on a Dell Inspiron 7610 with RTX 3060, latest nvidia drivers (also prior to that) on Fedora 36. In the case of this device, the Nvida GPU is the card with exclusive control over the USB-C (Thunderbolt) output.
Note that the Dell Inspiron 7510 and Dell Vostro 7510 may also ship in configurations including an Nvidia GPU, and the Inspiron 7610 ships also with, e.g. RTX 3050. These devices, I believe, are all wired up identically, so would all be affected.
This issue can be amplified by limiting the Nvidia GPU performance through nvidia-smi (simply assign minimum clocks).
The behaviour you are observing, essentially, smells like massive polling overhead for the sake of synchronizing “something”; in a modern world, waiting on a completion event would really help.
Can confirm this happening on my legion 5 laptop on Manjaro Linux (same on any linux)
So high cpu usage and choppy performance happens when in Hybrid mode and Nvidia Adaptive profile. Switching to High performance mode solves render perfomance and cpu usage, but at the cost of the battery. Using Nvidia only mode again solves the issue with CPU usage and performance but comes with it’s own set of problems ( power consumption, etc)
When switched to Nvidia Performance mode, I noticed the power consumption constantly peak at around 30watt was on idle state just moving cursor not rendering stuff.
nvidia-smi allows to set a number of parameters which do have direct, controlled impact on the performance of the GPU itself; “performance mode” and what not are just names for a set of parameters which, effectively, translate into clocking limits (which can be more transparently expressed through nvidia-smi)
forcing the GPU into real low power mode (minimum clocking for the processor and the clock) tends to expose design challenges much better than anything else; for some experiments I was clocking to 100/100 (which is invalid data, but nvidia-smi seems to grasp the intent and floor the parameters at the minimum acceptable values)
Given that,
high performance shoots the processor and the memory clock to the sky → high power consumption on the Nvidia package → plenty of energy flowing through the system → thermal load on Nvidia side → Nvidia fan(s) spin up
battery saving clocks everything down Nvidia side → CPU driver keeps polling desperately waiting for the Nvidia infra to complete → CPU load → thermal load on CPU side → system fan(s) spin up
Solution: No polling, please. Event completion, please.
(Sorry, I don’t know which event the Nvidia infra is polling for, but I am highly confident that a switch from polling to event trigger processing really could make a difference, in case this is technically feasible. I suspect the Nvidia driver is polling for memory copy completion, because in my case, that’s the only thing the driver is tasked with)
Really? I have an old Intel processor it runs just fine, on my gaming laptop when it only uses an AMD igpu the xorg will be also fine. I had enabled Wayland on Nvidia it was terrible, everything went crazy most of the application won’t run until I reverted back to Xorg.
So it is probably some issue with Xorg and a specific set of monitor + GPU + driver.
Also, I went to Wayland with Intel because Iris XE drivers were disgustingly bad on X11. They are kinda fine now, but I still have weird issues like that. And they were working great on Wayland from the beginning.
People above were blaming Nvidia drivers for Xorg burning CPU with some monitors but in my case I also have Xorg process burning in the same manner (~30% usage with some monitors) while my laptop never even seen Nvidia drivers, so it must be an Xorg problem not Nvidia.
I have clearly demonstrated (flamechart) that seriously undue CPU consumption under effective idle operations, in a specific scenario / environment, can be attributed to code being executed inside the Nvidia driver.
Please do advise, based on some detailed technical analysis which I am confident you will provide, where specifically to look for the alternative root cause if not in the Nvidia driver?
I just found out that on xorg issue is still there but to my surprise it’s no longer happening on Wayland.
Running Fedora 37 with driver 520.56.06 on Lenovo Legion 5, amd + rtx 2060 conencted to external 1440p monitor over HDMI
We have been able to root caused the issue and checking internally with the fixes.
Shall communicate further once fix is integrated and released publicly.
Same problem here. ASUS ROG Strix G15 G513RM. AMD iGPU and NVIDIA dGPU. When using external monitor via HDMI cable with Reverse PRIME configuration, X process eats 25-30% CPU. When I use only one monitor without Reverse PRIME configuration (internal with iGPU or external with dGPU) everything is fine. I want to be able to use both monitors utilizing Reverse PRIME technology without X eating third of my CPU. I hope the fix that @amrits is talking about we’ll be released soon and solve this X CPU utilization problem.
I have tested the latest beta driver(525.53). To my great regret, i found that it does not contain a bug-fix to this particular issue.
Hadn’t really expected it because amrits, has yet to “communicate further” on the matter.
But nevertheless i tried, and shared my experience here.
Hope that the fix is integrated quickly, and of course released.
SETUP: ASUS TUF A17 (FA706IU), AMD 4800H, Geforce GTX 1660Ti