Vulkan/Wayland vkQueuePresentKHR waits for GPU to finish

Update I tracked this down to a driver issue that is fixed in the 560 beta. But hijacked the issue with another one, as explained in the follow up messages, the Wayland frame callbacks are now restricting the frame rate to 1/2 or 1/3rd of what it should be. And that does not appear to be an Nvidia issue either.

vkQueuePresentKHR always seem to wait for the GPU when running on Wayland. This does not happen on XWayland, X11 or Windows. This is a serious performance problem with both GPU and CPU load, since it can halve the framerate.

I originally thought this was a problem with wgpu, so I reported the problem here You're invited to talk on Matrix. But after further investigation, it looks like it’s a generic problem on Wayland, and only Wayland.

To verify this, I modified the vkcube sample here GitHub - fredizzimo/Vulkan-Tools at present-blocking, by faking delays on both the GPU and CPU. And took captures with nsight-system.

On Wayland it looks like this (run vkcube-wayland), observe that vkQueuePresentKHR, waits for the GPU and that the CPU time therefore is around 13 ms, missing the VSYNC interval of 10 ms, and therefore dropping frames


On XWayland and other platforms, both the CPU and GPU are run in parallel, and the framerate is a stable 100 FPS

It seems like a serious problem, and I haven’t been able to find another report like this, either on this forum or elsewhere on internet, after a lot of searching.

It also does not seem to be an explicit sync issue, I’m currently running nvidia-dkms-555.58.02-1, but I have also tried with nvidia-dkms-550.90.07-4. It also does not seem to be kernel related, I’m currently running 6.10.2.arch1-1, but I have also tried with 6.6.42-1-lts. I have also tried both KDE Plasma 6.1.3, and Gnome, so it does not seem to be compositor related either.

My GPU is GTX 970, and I will be able to try with a 3070 in a couple of days. But I don’t have any other GPUs available to test with, so I can’t for sure tell if it’s a Nvidia specific problem or not.

OpenGL applications also don’t have this GPU/CPU lockstep issue.

The present mode does not change the behaviour.

There’s also a problem with the wayland frame callbacks (which may or may not be related), at least on KDE Plasma, which makes the problem even worse, cutting the frame rate in third. So, this from another app, using the tracy profiler instaed of nsight this time, runs at 50 or 33 FPS instead of what should be100 FPS, due to the present blocking

The nvidia-beta-dkms-560.28.03-1 does in fact fix the first problem with the present blocking. But due to the second bug with the wayland frame callbacks I failed to notice. The 560 drivers are also so unstable for me that I can’t use them at all, just barely good enough to launch and save the capture, but for example launching a web browser does not work at all.

Anyway, this is how the poorly wayland frame callbacks work on this system. For some reason, they introduce a one or two frames worth of waiting, which prevents this from running 100 FPS as it should. Next, I will try to determine if this is a kwin bug or not.


Edit:
Gnome does have the same problem as KDE plasma, waiting way too long for the frame callbacks. It should perhaps not run at a stable 100 FPS though, but 50 should be completely stable. This is how it looks like with the wait for the frame callback disabled

There it also blocks in present, but this time it looks like it’s just to hit some vsync.
Edit 2:
The above capture was taken with 10 back buffers, and fifo, here’s how it looks with one, showing that the workload should be mostly fine for 100 FPS, with some drops.

I then reduced the random sleep amount to get it run constantly under 10 ms without frame callbacks, but this is the result when enabling them.

Since this problem occurs both in Gnome and KDE, my best guess at the moment is that it’s a NVidia bug. Or possibly in my application, but that’s not doing anything else than calling winit’s pre_present_notify just before calling present through wgpu, as I think should be the correct way of doing it.

PS. I also managed to get the 560 drivers work decently by installing egl-wayland-git. So should be able to do much better testing now.

After more investigation, I don’t think that this is an Nvidia bug. It’s just that the frame callbacks in the compositors are quite naive, and don’t consider that the GPU and CPU can run in parallel

Consider this:

The compositor runs in the highlighted region, and when it’s done it calls the frame callback to wake the scene_viewer up. The problem is that it waits for the GPU to finish the frame, so the CPU will be idling while the GPU and compositor is doing its work. Which seriously limits the CPU work that can be done in a frame, and also the amount of GPU work that can be done, since the more you do on the CPU, the less time you have on the GPU.

And sometimes this is delayed another frame, because it didn’t manage to be ready in time for the compositor. The frame callbacks are not really synchronized to the vsync, other when there’s a miss, and then the first frame after that is synched to how long the compositor takes after the vsync.

The problem is that we do have to rely on frame callbacks or render in another thread in order to not block the main Wayland event loop. Present will not finish if the window is hidden for example. But maybe, there is a set of rules (like is the window hidden) that we can check before presenting to guarantee that it never blocks?