Problems with VK_KHR_swapchain

Hi,
i’ve found 2 problems with VK_KHR_swapchain. System is Win7 SP1 x64 with a GTX 970 and driver 368.39.

  1. There is a corner case problem with vkCreateSwapchainKHR and vkGetPhysicalDeviceSurfaceCapabilitiesKHR. If the window is minimized or resized to the smallest possible height, vkGetPhysicalDeviceSurfaceCapabilitiesKHR will return a currentExtent of (0,0) or (width,0). Regardless of the current size, minImageExtent is (1,1) and maxImageExtent is (16384,16384). The spec defines for Windows that a swapchain must match the current window size, yet vkCreateSwapchainKHR does not support the zero extent and fails. On the other hand, if i clamp the current extent to min and max extent, it works, but the LunarG validation layers throw an error because the swapchain extent is not the same as the current extent. So either vkGetPhysicalDeviceSurfaceCapabilitiesKHR should not report a zero extent (clamp the output), or vkCreateSwapchainKHR should accept a zero extent (clamp the input).

  2. The second problem is with vkQueuePresentKHR. When i use 2 images with FIFO presentation mode, vkQueuePresentKHR becomes a blocking call that only returns after VSync, so can block up to ca. 16.4 ms. On the other hand, calls to vkAcquireNextImageKHR or waiting on its fence or a fence on a submit of the last frame will never exhibit any kind of blocking. Since this not only blocks the render thread, but also access to the swapchain, there is no way to get a semaphore for the next swapchain-image and pre-submit work that should start running after VSync. My understanding was that with a 2 image FIFO swapchain, the semaphore/fence on vkAcquireNextImageKHR could be used to wait for VSync on the GPU/CPU, respectively, to better schedule work. So is this a “bug” in the driver or if not, how should this problem be solved on NVIDIA hardware?

Regards

As an update, both problems are still present in the new 368.69 driver.
A statement from NVIDIA would be appreciated, especially on issue 1 since it violates the spec, even if its a minor issue.

Regards

I’m also wondering why vkQueuePresentKHR blocks. Would be nice to get an answer to that.

virtual_storm: couldn’t you create a queue and thread dedicated to presenting and use other queues and threads to perform rendering while vkQueuePresentKHR blocks?

@shahrouz:
Rendering in another thread with a separate queue doesn’t solve the underlying sync problem. That is, i have no way to wait on V-Sync on either CPU nor GPU side. Now given, i could make something like this:

[render thread] submit the frame except the last shaders that write to the swapchain, then wait on present.
[present thread] wake up render thread after vkAcquireNextImageKHR with the index and semaphore.
[render thread] submit final write to the swapchain, using the present index and acquire semaphore, and send the present semaphore to the present thread. Then start rendering the next frame.
[present thread] present the swapchain using the present semaphore (this blocks)

The problem with this is that it introduces one extra frame of latency. That is, the image rendered in frame i will be presented at frame i+1, because the present needs to wait a full frame on V-Sync.

On the other hand, if i submit and present in the same frame, there is no extra frame latency, but i have a delay on the CPU after V-Sync until i can start to submit. The lowest delay i could get, from V-Sync to start in HW measured in GPUView, is about 200µs. That is if i submit a cmd buffer right after present, call vkAcquireNextImageKHR after that and then submit a cmd buffer that blits the result into the swapchain to present it. GPUView shows that after V-Sync, the driver sends a cmd buffer with CPU wait before returning from vkQueuePresentKHR. That is separate from the present and present token package, that are send before V-Sync.

Additionally, even with VK_PRESENT_MODE_FIFO_KHR, vkQueuePresentKHR will only block longer than it takes until present is done if the rendering is fast enough. If the render takes longer than until the next V-Sync, vkQueuePresentKHR will return as soon as the render is done, it will NOT wait until the next V-Sync. This is also true for the semaphore and fence in vkAcquireNextImageKHR, so once you miss your V-Sync, you cannot resync unless you render the next frame fast enough to hit V-Sync again. And while you could (in theory) measure if you miss your target the first time, there is no reliable way to know if you are back in sync.

Btw: if you use mailbox mode with 2 images in the swapchain, present will not block, but neither will vkAcquireNextImageKHR or the fence. The fence from the submit will block now, but of course only as long as the render takes. So in this mode, there is no V-Sync at all. This should only be the case with 3 images in the swapchain, but it behaves like this with 2 images too.

Vulkan is supposed to give you explicit control, so i don’t understand why they made swapchains behave this way. The only call that should block, based on present mode and images in the swapchain, is vkAcquireNextImageKHR, if at all. The best would be if none of the functions would block and you can control V-Sync via the semaphore and fence of vkAcquireNextImageKHR.

Did anyone manage to get anywhere with the issue of vkQueuePresent blocking in FIFO mode? I just ran into the same thing and I could not find anywhere in the Vulkan spec that mentioned that it may block. I expected only the wait for my fence or vkAcquireNextImage to block (if passed in a timeout).

I also noticed that my 1060 GTX does not seem to support the Immediate present mode, but it does support a 2-buffer Mailbox mode that does not block (my test runs at 1000+ fps). It was my understanding that a 2-buffer mailbox mode behaved the same as FIFO.

We’re about to do some tests with an AMD card as well to see how it behaves (the AMD card appears to only have Immediate and Fifo modes), but I’m still curious as to what the correct behavior is.

There also doesn’t seem to be something to sync after “N” vsyncs like IDXGISwapChain::Present ?

I am also experiencing vkQueuePresentKHR blocking in FIFO mode. Even with mailbox it seems to take a significant amount of time (almost 1ms). It’s a bit unexpected. If it’s worth anything, I used a gsync compatible laptop with a GTX980M.

I don’t see an immediate presentmode either. And I had the same impression, Ziflin, but I guess it depends on the hardware. The spec only says that minImageCount+1 can be expected not to block when calling vkAqcuireImageKHR.

How did the AMD card behave?