Hi,
Recently when profiling, I noticed that a pass that usually takes ~0.2ms was taking ~4ms when run through NSight GPU Trace profiler. Looking at the trace, it appears that each dispatch is waiting to complete before the next one is launched, as if there were a barrier between each one (though there are not barriers there according to the trace).
Is this expected/intended/necessary behaviour for the profiler? Or maybe is it a side-effect of one of the settings in the profiler config?
Which graphics API are you using ? are you adding performance marker ranges around each dispatch?
I’m using Vulkan.
If the markers you’re referring to are the ones that appear in the “Markers” section of the trace, we only have them around each pass, not each dispatch.
It’s not expected. Which GPU are you seeing this on?
Also, what other APIs are there between the vkcmddispatch calls?
So far seen this on RTX 2080 and 4080.
API calls show vkCmdBindPipeline, vkCmdBindDescriptorSets, vkCmdDispatch (x1000)
Can you try and disable “Trace Shader Bindings” ?
![]()
Hello, sorry for the delay, I was on PTO. The problem persists with Trace Shader Bindings disabled. Here’s a screenshot of all my settings.
One unusual thing about our setup (though I hope it isn’t relevant) is that we’re using a D3D context to handle the final presentation at the end of the frame, though the rest of the frame is done via Vulkan.

