presentKHR still blocks on windows even when using VK_KHR_present_wait

NVidia Vulkan drivers have had a long standing technical debt where presentKHR blocks until the image is presented, which can be for a very long time in FIFO mode.
Even if that is a valid implementation, that’s a though pill to swallow since the spec doesn’t even allow a timeout parameter for that call.

The VK_KHR_present_wait extension is supposed to provide a vkWaitForPresentKHR function which can be used to wait for presentation.

The NVidia Windows Vulkan drivers claim to implement that extension, but according to my tests, the blocking behavior on presentKHR is still there, which makes the utility of this extension a bit dubious.

Is this behavior as intended, or is that a problem on my end?

I attach the demo which I used for my tests, which is just vkcube.cpp extended to support the extension.

First patch is the main implementation, second patch adds printing of some metrics, and I also attach the full source code for convenience.

The diff is against the demo in SDK version 1.3.243.0.

FWIW the same demo works as expected when tested on a WINE Linux setup with Intel graphics.

On NVidia windows it prints metrics which suggest all waiting is happening on presentKHR:
present:13.488300ms waitForPresent:0.059500ms
On Linux Wine intel it prints metrics which suggest no blocking on presentKHR, all the waiting is happening on waitForPresent:
present:0.151900ms waitForPresent:30.070200ms.

The affected setup is Windows 11 machine with a RTX 3060 connected to a PHL 499P9 monitor.

Thanks.
commit-420063b (8.2 KB)
commit-678d8b8 (1.7 KB)
cube.cpp (138.0 KB)

Hello @mizvekov, thanks for pulling the issue from Discord! And welcome to the NVIDIA developer forums.

Let’s see if we can get some feedback.

Ping, in case this was forgotten about.

Thanks.

I’m not from NVIDIA.

And sorry for the necrpost, I just want to point out several things for anyone who bumps into this post (like I just did):

The NVidia Windows Vulkan drivers claim to implement that extension, but according to my tests, the blocking behavior on presentKHR is still there, which makes the utility of this extension a bit dubious.

Is this behavior as intended, or is that a problem on my end?

This behavior is allowed by the standard. The spec says: “Calls to vkQueuePresentKHR may block, but must return in finite time”.

Originally drivers were intended to block in vkAcquireNextImageKHR which has a timeout value like you said, but due to various technicalities, this information is not known until vkQueuePresentKHR so drivers chose to block there on certain OSes.

Mesa RADV on X11 on Linux also waits on vkQueuePresentKHR.

The VK_KHR_present_wait extension is supposed to provide a vkWaitForPresentKHR function which can be used to wait for presentation.

The NVidia Windows Vulkan drivers claim to implement that extension, but according to my tests, the blocking behavior on presentKHR is still there, which makes the utility of this extension a bit dubious.

That is not what VK_KHR_present_wait is for. vkQueuePresentKHR is blocking until at least one swapchain has become available again (e.g. you’ve created 4 image swapchains and you’ve submitted all 4 images).

This extension allows to wait until a specific swapchain has been presented to prevent the CPU from getting too far ahead from the GPU and thus reduce latency. This is useful when you don’t want to be more than e.g. 1 or 2 frames behind but VkSwapchainCreateInfoKHR::minImageCount is substantially larger.

While you can use VkFences to avoid the GPU getting too far ahead, a VkFence lets you know when the GPU is done doing frame N, not when the GPU is done presenting frame N. This gap between work done and presentation increases latency.

Thanks for the response.

I think the problem is that for a swap chain with a very small number of images (ie 2), VK_KHR_present_wait on Nvidia driver does not do anything useful, it just returns immediately, while on other drivers, even those which normally block on vkQueuePresentKHR, it does wait until a moment where you can call present without blocking at all.

I think the vkcube demo I provided does a nice job of showing the problem.
It’s a shame a demonstration of this extension was never incorporated there.

If the behavior of the NVidia driver is allowed, I think it is still an issue that there is no way to query the implementation if it’s going to behave that way.