CUDA Profiler Cost? How much time is added and where?

kristleifur · May 7, 2009, 6:16pm

Hi,

when running the CUDA profiler on code, can anything be said about the cost of profiling? I notice that a 20ms execution time becomes around 25-30ms with the profiler.

Is this time cost added to either of the gpu_time or cpu_time fields in the profiler output?

(I imagine that it works like this: The GPU time is unaffected, but there’s a small delay after each kernel execution while the profiler data is gathered up.)

tmurray · May 7, 2009, 6:32pm

Not sure if it still prevents overlap as of 2.2, but I think the way you describe it is basically correct for a single kernel.