Nvidia, please get it together with external monitors on Wayland

@abchauhan here are the perf samples captures. One with the compositor running on intel (the problematic one), and one with the compositor running on nvidia.
Both tests were conducted on the same scene, paused, in a replay of dota 2, with the same graphics settings, v-sync on, and the game’s window full-screen on the external monitor, which is connected via displayport. Furthermore, the dota process was isolated to cpus using taskset, so no other process would cause scheduling contention.
The samples were captured with

sudo perf record -F 99 -p <pid> -g -o <file-name>.data and processed into viewable format using perf script.

I suggest you check the profiling information using a viewer such as firefox profiler, since it’ll let you dive into individual threads and make flamegraphs for you.

From bird-eye view, you immediately spot that in both cases _sched_yield is what takes up the most time in the vulkan rendered thread, but in case of intel, you’ll only find 354 samples in all 26 seconds, while on nvidia, you can find 654 samples in 22 seconds, almost twice as much, which means latency is much more reduced and the thread gets to do more stuff at the right time, instead of waiting.

dotaperf-compositor-.zip (973.5 KB)