Vulkan GPU Marker seem off

stadlbauerpascal · March 4, 2025, 2:53pm

HI,
I am trying to understand the Vulkan GPU Markers visualization. As an example I will use Falcors WhittedRayTracer, which executes a GBuffer-Pass and after that the WhittedRayTracer-Pass.
The CPU-Markers are as I would expect them.

But I do not get why the GPU Markers look this way.

It seems that the WhittedRaytTracer-Pass is running next to the GBuffer-Pass, but only for a very short duration.
I am of course debugging a different application, but the marker placement always is similar to this simple example.
I ran this with the latest drivers and latest nsight systems on Windows and Linux.

hwilper · March 4, 2025, 10:14pm

@dofek

ushomroni · March 9, 2025, 5:30pm

The GPU workload marker ranges correspond to the execution timings of the individual commands inside the workloads (command buffers). They are produced by inserting timestamp queries (or a driver-level equivalent, in the case of Vulkan applicatoins) into the command buffers and reading their output when the workload finishes executing.

Some operations, like raster rendering (e.g. vkCmdDraw) will always execute in a sequential manner since the GPU only has a single graphics pipeline and a single graphics hardware queue.

Some operations, such as copy operations or compute operations, can sometimes be parallelized so that they run on the asynchronous compute and copy queues in the hardware. This may happen even if the workload, at the graphics API level, was placed inside a graphics / direct queue.

Ray-tracing operations can run on the async compute queues - so in this case, the GPU’s internal scheduler decided that the ray tracing workload inside the WhittedRayTracer marker can be executed without waiting for the previous marker to finish, and ran it in parallel. Apparently, it finished executing faster, so its end also came before the first marker ended.

If there is a resource dependency where this should not be the case (i.e. the ray tracing workload has dependencies on operations being performed inside the first marker), perhaps your application is missing some resource barriers to be inserted between the two parts of the command buffer.

Another option could be to split the operation into two command buffers and set a fence object to synchronize between them - if you want to make sure the entirety of the first operation ended before the second one begins.

Speaking more broadly, if you are unsure how certain execution patterns came to be, another helpful ability of the tool is to show the Windows driver queues by activating WDDM trace. While this information is not aware of the Vulkan debug utils markers, you can select the main “GPU Workload” (green-colored) bar in the queue’s row and it will be correlated to the WDDM events that show it going through the scheduling and execution pipeline. The second option I mentioned before (using two command buffers with or without a fence) will show this even more clearly since they will be two separate workloads in that case.

Hope this helped you understand what is going on here and feel free to ask any follow-up questions if anything is still unclear.

Regards

stadlbauerpascal · March 10, 2025, 7:56am

Thank you for your detailed answer!

I do not think the WhittedRayTracer can run before the GBuffer has finished.
There are barriers in place between all major passes. I also checked timings in Nsight Graphics, which can be seen in the following screenshot

The GBuffer and WhittedRayTracer show about the same execution times, which is very different from what Nsight Systems shows.
Also one can see the barriers between passes. As mentioned before I picked Nvidia Falcor examples, while there can of course be flaws in there, I view them as stable examples.
The following screenshot shows Nsight Systems again and the arrow shows where the Tonemapper is.

This can of course never happen before the GBuffer has finished.

I have tried multiple programs and have seem similar results.
This kind of behavior can be observed inside a single command buffer.
The first pass seems to take as long as all executions combined and the rest of the passes take only a very very short duration (even if in reality they take much longer than the first pass) and happen during the first pass.

Can my recording settings be that wrong to produce such results?

Topic		Replies	Views
Event markers working in Frame Debugger but not in GPU Trace Nsight Graphics	7	347	April 14, 2025
GPU trace shows wrong timeline when using 2 GPUs Nsight Graphics vulkan	15	354	October 22, 2024
Dispatches executing serially when run through GPU Trace Profiler Nsight Graphics performance , profiling	6	106	November 13, 2025
Inconsistent times when profiling Vulkan-based render engine compared to D3D11 profiling Nsight Graphics	5	1508	April 11, 2022
Optimizing VK/VKR and DX12/DXR Applications Using Nsight Graphics: GPU Trace Advanced Mode Metrics Technical Blog	0	520	August 25, 2020
Nsight Systems shows Async Compute work I did not start Vulkan	1	82	May 6, 2025
What is the difference between the CUDA API and CUDA HW lines in the Nsight Systems GUI? cuDNN ai	1	101	February 28, 2025
Shader type in gpu hang is reported as "Internal - Ray Tracing" Nsight Aftermath SDK ray-tracing , vulkan-raytracing	3	977	March 6, 2023
Implement support for kernel ftrace / gpuvis Linux	0	960	September 21, 2017
NVTX Markup Problems with Nsight Graphics Nsight Graphics	4	1008	January 10, 2023

Vulkan GPU Marker seem off

Related topics