Negative latencies

frankchen8508 · November 27, 2024, 12:02am

Hi,

I’m using CUPTI to profile the latency from the CPU issuing a kernel to the kernel’s execution. I get a timestamp at the entry of CUPTI_RUNTIME_TRACE_CBID_cudaLaunchKernel_v7000 by calling cuptiGetTimestamp. I then use the start timestamp of CUpti_ActivityKernel9 (from CUPTI_ACTIVITY_KIND_CONCURRENT_KERNEL) and subtract the timestamp I got from the callback function. To ensure everything is consistent, I also retrieve the start timestamp of CUpti_ActivityAPI from CUPTI_ACTIVITY_KIND_RUNTIME, using correlationId to match them. The attached image shows the results.

For correlationId 23679 (not the cbid in the image), the kernel start time from CUpti_ActivityKernel9 is 1732487239483534075. The timestamp from the entry of the callback function is 1732487239483576598. The start time of CUpti_ActivityAPI is 1732487239483561019. The start timestamp of the kernel is lower than the other two timestamps. However, it seems impossible for kernel execution to begin before its corresponding runtime API call. Why does this happen? This issue occurs frequently in our application.

I’m using CUDA Toolkit 12.4. The application is executed using the torchrun command, which involves multi-threading.

veraj · December 2, 2024, 3:27am

Hi, @frankchen8508

Thanks for reporting this issue. This seems like a issue we recently fixed. Can you please use the latest CUDA12.6 Update 2 toolkit to have a try ? Thanks !

Topic		Replies	Views
Get launch kernel response time by CUPTI CUPTI – CUDA Profiler Tools Interface	7	1163	May 9, 2023
Using CUpti_ActivityKernel4 to find the start and end time in ns for a kernel wrapped in a function CUPTI – CUDA Profiler Tools Interface	7	1128	January 23, 2020
CUPTI initalization and CUDA API calls CUPTI – CUDA Profiler Tools Interface	3	1274	June 27, 2022
Speed up due to a kernel launch ? CUDA Programming and Performance	3	1193	December 26, 2009
Getting information about CUDA kernel executions from another process CUPTI – CUDA Profiler Tools Interface cuda	7	386	August 12, 2024
different results with cupti and nvprof. CUPTI – CUDA Profiler Tools Interface	2	805	March 31, 2020
CUPTI Activity API giving asynchronous events with bogus(?) timestamps CUPTI – CUDA Profiler Tools Interface	1	610	November 6, 2019
Timestamp Callback and device (kernel) activity records CUPTI – CUDA Profiler Tools Interface	6	147	September 17, 2024
How to obtain streamId inside a CUPTI callback CUPTI Callback CUDA Programming and Performance	1	1230	February 14, 2012
How to use CUPTI to get average instruction execution time? CUDA Programming and Performance	7	1037	March 20, 2018

Negative latencies

Related topics