Severe user input lag in Vulkan on Windows

Hi,

When using the FIFO present mode in vulkan, myself and other users working on the same project are experiencing user input lag of about almost a second on Windows10 - enough to make games completely unplayable. We are using the GLFW3 library to create windows.

This used to only occur with fullscreen windows, and could be worked around by setting the nvidia control panel setting “Vulkan/OpenGL Present method” to “Prefer layered on DXGI Swapchain”, but since the latest driver update from 552.22 to 556.12, not even this works any more, and the problem now affects both normal and fullscreen windows.

Is there anything I can do programmatically to deal with this? We also provide a D3D12 backend so it’s not the end of the world, but it does mean the Vulkan backend is pretty much completely unusable on Windows, which is a shame because the Vulkan backend is otherwise better in most ways, esp. in terms of shader performance.

Bye,
Mark

OK I’ve found a solution for this, but I think it still suggests there’s a driver bug.

My app is actually written using the ‘dawn’ library, which is a c++ implementation of the webgpu API by Google.

Dawn offers D3D12, D3D11 and Vulkan backends, all of which work well except for Vulkan which exhibits this problem.

Anyway, the ‘fix’ in dawn is to using the webgpu Queue.OnSubmittedWorkDone method to pause rendering until all the previous frames work has been done, before submitting any new work. This has the effect of adding a ‘vsync’ to the app, and rendering remains smooth and lag-free.

I’ve checked the dawn source, and it does request 2 swapchain images from Vulkan when Fifo present mode is used, which as far as I know should have the same effect as ‘vysncing’. But it looks like nvidia drivers are ignoring this, or maybe this is something you’re meant to do in Vulkan (I’ve never used it directly) and it’s actually a bug in Dawn. Does anyone know?

I’m not sure what the equivalent call is in Vulkan, but if you’re one of the many people who appear to be having problems similar to this, I suggest giving this a try.

Hmmm…so is this how Vulkan is supposed to work? Is this expected behaviour or a driver bug? Is anyone from nVidia actually reading anything posted here?!?

Hi Mark,

Welcome to the NVIDIA Developer Forums, thank you for posting this issue.

What you are describing sounds like your application is queueing many frames in advance. Since your present queue is large, it seemingly takes a lot of time for your application to react to user input.

If you are using VK_PRESENT_MODE_FIFO_KHR without any sort of pacing implemented on the CPU side, this would indeed be the expected behavior. Neither vkAcquireNextImageKHR (which I think implements SurfaceTexture::GetCurrentTexture) or vkQueuePresentKHR (SurfaceTexture::Present) will generally block waiting for your present queue to complete before adding a new presentation request. Some implementations do, while some others don’t. As you have mentioned, even the same driver can behave differently in different conditions.

Queue.OnSubmittedWorkDone, I assume, synchronously waits for your submitted GPU work (via vkQueueSubmit) to be done. Since your swapchain only has 2 frames, your are indeed “vsync’ing” yourself, but this could end up being a bad user experience if it takes more than one presentation refresh cycle for your application to generate a new frame.

Frame pacing can be a complicated subject. Looking at the Dawn samples (src/dawn/samples/SampleUtils.cpp - dawn - Git at Google), we can see that they perform a sleep between frames, which is a very basic way of doing it.

When using Vulkan directly, VK_KHR_present_wait is a good way to implement frame pacing, though it is an extension that may not be available everywhere. VK_NV_low_latency2 is exposed on NVIDIA hardware and provides stronger guarantees. Crucially though, I don’t see any of these used in Dawn.

In general, I recommend using tools like NVIDIA NSight Systems to debug latency issues, as it shows present packets and which queue submission they correlate with, along with the CPU timeline that generated it.

I hope that helps,

Lionel

Hey, thanks for the reply!

I think I understand most of what you’re saying, what I don’t quite understand is why acquiring images from the swap chain isn’t naturally vsyncing my app. Dawn creates a 2 image swapchain, and according to this…

…swapchain images only become available on the vertical blank, as that is when images are moved from the ‘presented’ queue to the ‘available’ queue. So if my app is going render/acquire/present, render/acquire/present etc…what is acquire returning on the 2nd (or 3rd?) call if it’s not blocking? And if it is blocking, wont that prevent me adding more presents to the render queue?

The app is kind of behaving the way I would expect a 60-ish image swapchain to work.

Bye!
Mark

So if my app is going render/acquire/present, render/acquire/present etc…what is acquire returning on the 2nd (or 3rd?) call if it’s not blocking?

Because your application is using VK_PRESENT_MODE_FIFO_KHR, the order images will be acquired in can be trivially predicted by the driver. In your case, with two swapchain images, presents should simply alternate between both surfaces in most implementations.

But having successfully acquired an image index to record your command buffers does not mean it is ready to be written to - the image can still be on screen, or indeed not even presented yet. Your GPU needs to wait for the vkAcquireNextImageKHR semaphore to be signaled to start drawing to the image. I see in the Dawn source code that a new semaphore is created for each acquire. And because you are in vsync mode, if you have a lot of presents queued, this creates a lot of latency, which I think explains your issue.

Hopefully that makes sense. I encourage you to read the full Vulkan specification around vkAcquireNextImageKHR and other related functions. It can seem daunting but the concepts described there are important to understand when working with swapchains.

Lionel