High CPU usage on xorg when the external monitor is plugged in

The driver version 525.78.01 release notes suggest that a fix was made; there are two items worth reporting:

a) on my Dell 7610 with Nvidia 3060, Fedora 37, NVIDIA drivers via rpmfusion, KDE 5.26.4, with an LG 4K screen connected and displayed “to the left” of the laptop, the excessive CPU load - i.e. the performance issue - is gone; this also applies to Ubuntu 20.04 LTS with Mutter.

b) during testing I noticed serious functional misbehaviour, though, when running glxgears; step:

  • boot box
  • log into an X11 desktop environment (Fedora 37 + KDE; Ubuntu 20.04 LTS + Mutter)
  • start glxgears (without PRIME offload)
  • make sure that glxgears runs on the screen connected to the Nvidia GPU
    … and wait for 60 seconds to repeatedly observe, eventually, one of the following two broken behaviours
  • screen connected to the NVIDIA output turns black (but recovers within a second)
  • some garbage rectangles (pink pixel garbage) show up, and get cleared again in what appears to be random sizes and random locations, but only on the screen attached to the NVIDIA GPU

This faulty behaviour also occurs with __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears (and all other things being equal)

This faulty behaviour does not occur when the output screen is the laptop’s 3072x1920 (3K!) Intel GPU driven built-in screen.

This misbehaviour is not limited to glxgears; it also applies to, for instance, Firefox, or Visual Studio Code, when those application windows are moved around with the mouse; once the corrupt is present, moving the mouse will nicely animate the pixel garbage.

The photo below shows the pixel garbage - it appears in totally random locations, nowhere close where I would expect repaint damage. The garbage cannot be captured using a screenshot tool, e.g. KDE’s "Spectacle, all looks good there, hence the photo:

And, bonus misbehaviour: I think while glxgears was running in the background on the NVIDIA GPU screen, and while I was doing some wiggling on that screen, I got something to totally lock up, because the content on the NVIDIA GPU screen was totally frozen. Things did come back after I disabled the screen via KDE Display Configuration and re-enabled it. But this smelled a lot like a deadlock (somewhere).

I see exactly nothing of that reflected in logs as WARN or ERROR (and I really would expect at least the screen going all black to emit some kind of diagnostic). I expect that

Jan 07 14:18:13 fedora.home kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128
Jan 07 14:18:15 fedora.home kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128

is the result of my “fixing” the deadlock, not something that was emitted by the running driver.

The problem is greatly improved (the high CPU), but CPU usage is still higher (6-8%, vs. 2-3%) with no DE apps running with 1 or 2 external monitors enabled, than with none.

Also, the ordering/assignment of displays is all messed up. If I set my laptop display as primary, the taskbar gets moved to the 3rd monitor. If I force an app to run on primary, it runs on the 2nd monitor. When I disable both displays, go to sleep, and wake up, the 2nd monitor re-enables itself (likely due to hotplug events in the driver detecting a “new display” during wake).

So…better than not being able to use it at all, but buggy AF. I’m going to try your test and see what happens.

I have a 5800h (AMD) + 3070 laptop, with mux. I only tested in hybrid mode (internal display = AMD, external displays = nV). I ran multiple glxgears, including with no frame rate cap, and several concurrently, and I don’t see any corruption. Interestingly, the instance running on the AMD driver, even with “reverse prime” (AMD driver, displayed on an nV display), maintains >15k fps…the nV instance hovers around 4-5k fps (on an nV monitor), and if I drag it to the AMD display it drops to 3-4k fps.

Performance mode is “auto” but it pegs itself to the max clocks.

I think the issues you’re having likely have something to do with the Intel hybrid config.

I’ve downloaded the latest driver 525.78.01 but no luck, still high usage 25-30%.
nvidia-bug-report.log.gz (431.0 KB)

For me, after updating to 525.78.01 the CPU usage dropped to 3%-8%. That’s good, but still not acceptable. When using only dGPU or iGPU the CPU usage is around 0.3%. I hope this isuue will be fixed too.

Also I’ve noticed a strange little issue. When I’m using external display with reverse PRIME setup, and playing games on external display (no matter whether via dGPU or iGPU), during camera movements the picture is not as smooth as when using only one GPU (dGPU or iGPU, external monitor or laptop - doesn’t matter) without reverse PRIME - the picture moves smoother. I’ve noticed this in such native Linux games as SuperTuxKart, Yamagi Quake 2 and 0AD. Vsync configurations don’t affect this issue at all.