NVTX ranges in Threads and in CUDA do not align with each other

For a given process, its NVTX ranges appear in both the Threads row group and CUDA row group. However, the same NVTX range in the two row groups may not align with each other.

Take the [demo QDREP file] (https://send.firefox.com/download/e949264371cca8f8/#2dSH42phQxpPcCRVTtWyWA)file as the example. For the NVTX range layer0 begining at 9.4918s, the NVTX range in Threads begins at 9.4918s and ends at 9.4923s while the same range in CUDA begins at 9.4923s and ends at 9.50365s. I confirm the two ranges are same, because the two ranges highlighted together as I select one of them.

I wonder whether the phenomenon is a bug or it is designed on purpose. How should we understand such results?

Thank you very much for your concern.

NVTX is traced on the CPU side, the timing you see there is the actual time of the code running. It is then “projected” onto the GPU side so you can see the period of time where the kernels inside this range were actually run on a GPU.

So CPU tells you when the call happened, GPU tells you when the work in that range was active on the GPU.