X is taking up a larger amount of CPU than normal. It usually sits around 50% (as seen in top), and will sometimes cap out at 100%. When it does so, keyboard repeats are slow (and maybe other things). Once I alt-tab, the utilization goes back to 50% or so and the keyboard repeats are fine. I need to repeat this process every 30 seconds or so.
The X driver polls the GPU when it’s waiting for things that shouldn’t take a long time to complete. If you’re seeing a significant amount of CPU time spent in there, then
a) the X server is processing some kind of request or event that requires waiting for the GPU, and
b) the GPU is being unusually slow at processing whatever it is.
Your bug report doesn’t have any output from nvidia-smi because it looks like it’s not installed. Can you please install it and see if it lists any processes other than the X server using a significant amount of GPU resources? Since you seem adept at profiling and disassembling the X driver, you could also try getting a backtrace from the X server to see what it was doing when it called into the driver.
Here’s the SMI output. That third process is chrome.
.
# /opt/bin/nvidia-smi
Wed Nov 8 20:07:44 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 387.22 Driver Version: 387.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 950 Off | 00000000:01:00.0 On | N/A |
| 33% 34C P5 12W / 90W | 188MiB / 1996MiB | 33% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3855 G /usr/bin/X 96MiB |
| 0 4062 G /usr/lib64/firefox/firefox 1MiB |
| 0 4874 G ...-token=DB1DFF97AA93937B0E6F44FBF809F203 85MiB |
+-----------------------------------------------------------------------------+
From the pmon, X pops up higher on sm utilization, if I’m reading this correctly.
.
# /opt/bin/nvidia-smi pmon
# gpu pid type sm mem enc dec command
# Idx # C/G % % % % name
0 3855 G 48 3 0 0 X
0 4062 G 0 0 0 0 firefox
0 4874 G 0 0 0 0 chrome --type=g
0 3855 G 43 2 0 0 X
0 4062 G 0 0 0 0 firefox
0 4874 G 4 0 0 0 chrome --type=g
0 3855 G 42 3 0 0 X
0 4062 G 0 0 0 0 firefox
0 4874 G 8 0 0 0 chrome --type=g
0 3855 G 44 3 0 0 X
0 4062 G 0 0 0 0 firefox
0 4874 G 4 0 0 0 chrome --type=g
0 3855 G 51 3 0 0 X
0 4062 G 0 0 0 0 firefox
0 4874 G 0 0 0 0 chrome --type=g
0 3855 G 49 3 0 0 X
0 4062 G 0 0 0 0 firefox
0 4874 G 3 0 0 0 chrome --type=g
I tried to get perf record to give me a backtrace that would point into X. The frame pointer and dwarf method didn’t work, most likely since it can’t backtrace out of the nvidia_drv.so. I tried the lbr method, and it gave me some info, though I’m not 100% on whether I got the right info or not.
Here’s the relevant snippets from perf record -g --call-graph lbr -p X_PID for 10 seconds, followed by perf report -g caller.
It looks like X is calling Dispatch->ProcCreatePixmap. Then we lose it for a bit, and we see 0x4be12. That’s the address that pops up in nvidia_drv.so. I haven’t tried the lbr method before, so I might be interpreting this data the wrong way.
Does ProcCreatePixmap ring any bells, especially something that might call into nvidiaAddDrawableHandler? I see a couple function pointers in there, which also lines up with what I see in xorg’s assembly (maybe).
nvidiaAddDrawableHandler is a red herring, it’s just the nearest symbol so everything gets reported as an offset from that.
The fact that the server is calling ProcCreatePixmap a lot means that some client is sending that request frequently. My guess is Chrome. Does the Chrome task manager (press shift+escape to open it) show any particular tabs open that are using a lot of CPU time?
If that doesn’t pan out, I would suggest using xlsclients and then start killing clients one by one until you find the one that’s causing the high Xorg load.