[NSys Timeline] End - Start is not the same as latency

Here is a screenshot of a profiling session.

In the yellow box upon hovering on a kernel call in stream 15, we can see that end - start is 46.531 microseconds. But the latency is reported as 7.347 microseconds. Why are they not the same? What is end - start capturing that latency is not capturing? Also I noticed the latency in the yellow box is the same as that noted in the corresponding launch kernel call in CUDA API events log in NSys.

I think we are having a terminology mismatch. Can you take a look at https://developer.nvidia.com/blog/understanding-the-visualization-of-overhead-and-latency-in-nsight-systems/


Indeed a well-written piece. Thanks for the article. It helps clarify the doubt. So latency is the time between the time when the API was enqueued, and the time the GPU started executing it. And duration in the cuda API trace is the CPU wrapper overhead.

Thanks, since I wrote it (with Jason and Bob).


1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.