The driver version 525.78.01 release notes suggest that a fix was made; there are two items worth reporting:
a) on my Dell 7610 with Nvidia 3060, Fedora 37, NVIDIA drivers via rpmfusion, KDE 5.26.4, with an LG 4K screen connected and displayed “to the left” of the laptop, the excessive CPU load - i.e. the performance issue - is gone; this also applies to Ubuntu 20.04 LTS with Mutter.
b) during testing I noticed serious functional misbehaviour, though, when running glxgears; step:
- boot box
- log into an X11 desktop environment (Fedora 37 + KDE; Ubuntu 20.04 LTS + Mutter)
glxgears(without PRIME offload)
- make sure that
glxgearsruns on the screen connected to the Nvidia GPU
… and wait for 60 seconds to repeatedly observe, eventually, one of the following two broken behaviours
- screen connected to the NVIDIA output turns black (but recovers within a second)
- some garbage rectangles (pink pixel garbage) show up, and get cleared again in what appears to be random sizes and random locations, but only on the screen attached to the NVIDIA GPU
This faulty behaviour also occurs with
__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxgears (and all other things being equal)
This faulty behaviour does not occur when the output screen is the laptop’s 3072x1920 (3K!) Intel GPU driven built-in screen.
This misbehaviour is not limited to glxgears; it also applies to, for instance, Firefox, or Visual Studio Code, when those application windows are moved around with the mouse; once the corrupt is present, moving the mouse will nicely animate the pixel garbage.
The photo below shows the pixel garbage - it appears in totally random locations, nowhere close where I would expect repaint damage. The garbage cannot be captured using a screenshot tool, e.g. KDE’s "Spectacle, all looks good there, hence the photo:
And, bonus misbehaviour: I think while glxgears was running in the background on the NVIDIA GPU screen, and while I was doing some wiggling on that screen, I got something to totally lock up, because the content on the NVIDIA GPU screen was totally frozen. Things did come back after I disabled the screen via KDE Display Configuration and re-enabled it. But this smelled a lot like a deadlock (somewhere).
I see exactly nothing of that reflected in logs as WARN or ERROR (and I really would expect at least the screen going all black to emit some kind of diagnostic). I expect that
Jan 07 14:18:13 fedora.home kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128
Jan 07 14:18:15 fedora.home kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67e:0:0:1128
is the result of my “fixing” the deadlock, not something that was emitted by the running driver.