This sounds like a classic race condition for OpenGL-based compositors: the X server sends Damage extension events as soon as it has queued rendering that will result in a change to pixels on the screen. However, it does not wait for that rendering to actually finish before sending the event.
If a compositor receives a Damage event and responds by sending rendering requests (e.g. using the XRender extension) back to the X server, then ordering on the GPU is guaranteed: the rendering that triggered the Damage event will happen before the rendering requested by the compositor. However, if the compositor is using direct rendering (e.g. GLX, EGL, or Vulkan) then there is no ordering guarantee: the rendering requested by the compositor can happen immediately, including reading from the window that the X server is about to render into.
The solution to this is to enforce the correct ordering by creating XSyncFence objects, importing them into OpenGL using the glImportSyncEXT function, and then telling OpenGL to wait for the X server’s rendering work to finish before reading window contents. I took a quick look at the picom source and it doesn’t seem to use these synchronization primitives, although I didn’t dig into the source in depth.
Using fences correctly is a little tricky. Please refer to src/compositor/meta-sync-ring.c in Mutter for an example of how to use and reset fences to fix this problem.
Hmm, that’s a good point. It looks like the code forces that option on whenever it detects NVIDIA, so it’s not surprising that trying to change it didn’t show any effect for the user.
I can’t reproduce the problem here, so I won’t be able to investigate it myself.