Measuring Kernel Latencies

Accelerated Computing CUDA CUDA Programming and Performance

youssefelmougy May 5, 2021, 10:43am 1

I have a GPU application that launches approximately 2000 kernels during execution. What is the best way to measure the kernel launch latencies and kernel receive latencies (time between kernel completion and when the CPU executes the data)?

It is possible to be done through the visual profiler NVIDIA Nsight Systems but I will have to go through each of the kernels one by one to gather the data, so I am looking for a much more efficient way. Thanks!

Topic		Replies	Views
analysis inside kernel CUDA Programming and Performance	2	1434	July 2, 2012
CUDA kernel is 6x slower in model than in a separate benchmark CUDA Programming and Performance cuda , kernel	6	439	February 17, 2023
Timing Concurrent Kernels CUDA Programming and Performance	1	2357	January 18, 2011
idle time, gaps between kernels qunatifying syncronisation overhead CUDA Programming and Performance	1	769	September 14, 2011
How to correctly measure kernel exec time? CUDA Programming and Performance	2	3071	March 19, 2008
Issues about the time shown in ncu Nsight Compute	4	104	March 19, 2025
Kernel Launch Time (CPU Time) Reported in Visual Profiler how to optimize kernel launch CUDA Programming and Performance	1	683	July 7, 2011
Inconsistent kernel execution times, and affected by Nsight Systems CUDA Programming and Performance	1	335	April 23, 2024
Performance measurement CUDA Programming and Performance	3	642	April 29, 2011
What does the idle time between kernel functions in Nsight System mean? Profiling Linux Targets nsight	1	736	August 27, 2021

Measuring Kernel Latencies

Related topics