The GPU workload marker ranges correspond to the execution timings of the individual commands inside the workloads (command buffers). They are produced by inserting timestamp queries (or a driver-level equivalent, in the case of Vulkan applicatoins) into the command buffers and reading their output when the workload finishes executing.
Some operations, like raster rendering (e.g. vkCmdDraw
) will always execute in a sequential manner since the GPU only has a single graphics pipeline and a single graphics hardware queue.
Some operations, such as copy operations or compute operations, can sometimes be parallelized so that they run on the asynchronous compute and copy queues in the hardware. This may happen even if the workload, at the graphics API level, was placed inside a graphics / direct queue.
Ray-tracing operations can run on the async compute queues - so in this case, the GPU’s internal scheduler decided that the ray tracing workload inside the WhittedRayTracer
marker can be executed without waiting for the previous marker to finish, and ran it in parallel. Apparently, it finished executing faster, so its end also came before the first marker ended.
If there is a resource dependency where this should not be the case (i.e. the ray tracing workload has dependencies on operations being performed inside the first marker), perhaps your application is missing some resource barriers to be inserted between the two parts of the command buffer.
Another option could be to split the operation into two command buffers and set a fence object to synchronize between them - if you want to make sure the entirety of the first operation ended before the second one begins.
Speaking more broadly, if you are unsure how certain execution patterns came to be, another helpful ability of the tool is to show the Windows driver queues by activating WDDM trace. While this information is not aware of the Vulkan debug utils markers, you can select the main “GPU Workload” (green-colored) bar in the queue’s row and it will be correlated to the WDDM events that show it going through the scheduling and execution pipeline. The second option I mentioned before (using two command buffers with or without a fence) will show this even more clearly since they will be two separate workloads in that case.
Hope this helped you understand what is going on here and feel free to ask any follow-up questions if anything is still unclear.
Regards