Why is my application waiting for a semaphore every 5 frames or so?

Aleric · June 23, 2022, 2:14pm

I wrote a test application that simply draws a lot of triangles in order to make the GPU spend time in vkCmdDraw. I am even only using a single command buffer and fence (no double buffering).

All I want is to understand what is happening.

The application shows that (with certain settings) every so many frames there is a “slow” frame: while normally the time between a submit of the command buffer and the begin of execution on the GPU is microseconds (say, 73 microseconds); every couple of frames there are roughly 14 ms or so in between the submit and the GPU to actually start doing something?!

I determined this using Tracy - but that didn’t tell me anything about the why.

Then someone suggested Nsight. Capturing a single frame doesn’t help at all (that doesn’t even tell me how long the frame was); the only way I could see this stall is by using “GPU Trace Profiler” - capturing several frames and then hoping that the weird skip is present in that capture.

I managed to capture it this way, which you can see here:

The only information that this adds (or even gives) is that the GPU
is waiting for a semaphore: “Wait: Semaphore=4 Value=16777215” (which is 0xffffff by the way).

How can I find out which semaphore that is? And why isn’t it signaled for ~12.3 ms every now and then?

AMAMODE · June 23, 2022, 3:29pm

That’s the value GPU Trace uses when it doesn’t find a corresponding signal in the user signals. This is typically coming from system-generated signals, like the one used for the vkAcquireNextImageKHR.

Considering your times are just a bit higher than 16ms, I would look towards how your swapchain and vsync interact.

Aleric · June 23, 2022, 4:09pm

Thank you for you help! I tried to run my application in Nsight-systems instead, and at least that is showing the VSYNC - which is just a little more information.

Here is a screen capture of a part of Nsight-systems:

This shows that the GPU activity (a single vkCmdDraw, that always uses 14.3 to 14.4 ms) begins more or less at a VSYNC, as opposed to immediately after the submit.

For clarity: I am not looking for a bug. I am assuming that this is normal behavior.
But, could you explain to me why the CPU would wait for a VSYNC as opposed to immediately starting to execute the submitted command buffer? I just can’t think of any reason :/.

Aleric · June 23, 2022, 4:39pm

Here is a capture made with Tracy:

This shows that (with these settings) every other frame is ~18.3 ms and ~31.7 ms.

Since, apparently, it takes more than 16.67 ms for one frame, I can’t get a frame rate of 60; and it has to sync with the VSYNCs somehow - so getting alternating CPU-frame-lengths is to be expected [The skip causes each pair of frames to be 50 ms long - which is 1/3 of the frame rate of my monitor(s) and results in an average frame rate of 25 ms, or 40 FPS. If each frame duration was of equal length - then the frame rate would have to drop to 30 FPS. Having alternating short/long CPU-frame times seems best.]

So what is my problem? Well, I don’t understand why the GPU stalls after the submit until the next VSYNC instead of simply stalling in vkQueuePresentKHR. There must be some resource not available at the moment of the submit, otherwise it would never stop. Which resource is that?

AMAMODE · June 24, 2022, 6:58am

The resource that’s presumably not available is the swapchain buffer you’re wanting to write to. The OS uses swapchain buffers you’ve filled out when doing desktop composition or scanout to display, depending on exactly which present code your application is getting.
If it’s actively using the buffer you’re wanting to write to, It will not signal the corresponding semaphore until it’s done with it (which will typically happen around a vsync event). How many buffers do you have in your swapchain?

Aleric · June 24, 2022, 9:27am

I am using two swapchain images. What I did was port API without Secrets: The Practical Approach to Vulkan* - Part 1 to my own engine (that’s a near 100% rewrite; but the app still looks the same, with the sliders and so forth). I am not even sure if they controlled the number of swapchain images at all (ie, together with the number of "frame resources’), but in my case those two are separated. I am currently adding a slider that will allow me to change the number of swapchain images away from the (now) fixed 2.

PS In case anyone is interested in that code (it’s open source, of course); the essence of the application is now in this file: https://github.com/CarloWood/linuxviewer/blob/master/src/tests/Window.h#L431

ushomroni · June 26, 2022, 8:31am

As you already realized from the Nsight Systems timeline, the long (red) frames end on the VSYNC timing, which means the swap chain is being exhausted before the GPU can render all the commands that were sent to it in previous frames.

One way to change this is to make the value of minImageCount inside the VkSwapchainCreateInfoKHR parameter to vkCreateSwapchainKHR. For example, if you changed that value to 4 (and your system successfully allocates that many swap chain buffers), you would see the long (red) frame appear every 4 frames instead of every 2.

If this were a real application you were trying to improve the performance for, you would probably want to increase the amount of GPU work being performed, to use offscreen rendering, or simply to disable VSYNC.

Aleric · June 26, 2022, 8:55pm

Thank you for insights. I am not sure I can follow though :(. Why would it wait 4 frames, instead of just 1 or 2? Is this to “reset” the swap chain and have the maximum amount of free images again? Why not have a shorter hickup, more often. It seems more logical to just skip one VSYNC, and doing that four times more often, then to skip four VSYNCs every time, but less often, no?

AMAMODE · June 27, 2022, 9:44am

I’ve had a quick look at your code. It looks like you’re using a single semaphore for all acquires (I’m not entirely sure, I only browsed). If that’s the case, I don’t see how that can work properly: you really want a semaphore for each swapchain buffer, to know when they’re available.

Aleric · June 27, 2022, 10:00am

It’s a trick; I use N+1 semaphores - where N is the number of swapchain images. This is necessary because you already have to pass a semaphore before knowing what resource you’ll get (when calling acquire_next_image). What is done is to use the semaphore of the resource that was returned the previous frame (so you know it is unused in advance). The swap occurs in Swapchain::update_current_index.

Thus,

// m_acquire_semaphore is known to be unused.
uint32_t new_swapchain_index;
acquireNextImageKHR(vh_swapchain, timeout, m_acquire_semaphore, vh_fence, &new_swapchain_index);
m_resources[new_swapchain_index].swap_image_available_semaphore_with(m_acquire_semaphore);

Now m_resources[new_swapchain_index] uses the semaphore that was passed to acquireNextImageKHR, and m_acquire_semaphore contains the semaphore that belonged to the returned resource, and therefore is known to be unused.

system · July 11, 2022, 10:00am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Program performs far better with NSight Nsight Graphics opengl	9	1408	July 11, 2023
NSight: gpu trace profiler fails to capture trace on 4070 Nsight Graphics	20	111	April 11, 2025
Way to automatically capture a frame based on certain parameters, e.g. frame time threshold? Nsight Graphics	6	829	August 26, 2022
Missing Frames GPU - Hardware	4	670	April 10, 2024
"Pause and Capture" not working Nsight Visual Studio Edition	9	3198	April 10, 2014
GPU trace shows wrong timeline when using 2 GPUs Nsight Graphics vulkan	15	97	October 22, 2024
Severe user input lag in Vulkan on Windows Vulkan	5	877	July 26, 2024
[Graphics capture:] Fatal failed object lookup (api: Vulkan, id: 157 [vkCreateSwapchainKHR]) Nsight Graphics nsight , graphics	7	45	June 10, 2025
Nsight 5.4 can't capture frame anymore Nsight Visual Studio Edition	13	2356	December 7, 2017
Poor multithreading performance compared to DX12 Vulkan	17	5460	September 29, 2020

Why is my application waiting for a semaphore every 5 frames or so?

Related topics