VkQueuePresent takes too long, blocking the frames

Hi, I am using the Nsight System to profile performance and latency issues.

It seems once in a while the vkQueuePresent will block a little bit and miss the original Vsync, but the GPU workload and timeline seem fine, I am wondering why.

My PC has RTX4090 and 2 monitors, and i notice that longer VkQueueuPresent always extends to another vsync time point, is this a problem?

I don’t know what is the reason for this behavior, but there are a few things you can try to do to get more info:

  1. I notice the thread that calls vkQueuePresent appears to block during the call (as is usual) but during the long calls it then performs CPU-intensive work for ~7 milliseconds. You can take a look at the CPU sampling points that were collected during this duration and to see which module is performing this CPU work. Most probably it would be the graphics driver.
  2. Enable GPU Metrics collection and capture another trace session. Look for memory transfer activity during the extended present calls - the app may be performing a lot of buffer migration.
  3. What present mode is the app using?

See also Unclear blocking behavior in vkQueuePresentKHR · Issue #1158 · KhronosGroup/Vulkan-Docs · GitHub


Thank you

  1. The CPU sampling point following the long call is mostly from nvlddmkm.sys.
  2. I did not notice suspicious memory transfer.
  3. FIFO, but it seems immediate happens too, just less.

report3.nsys-rep (85.1 MB)

The Vulkan driver team confirmed a fix is in flight for this issue.