Hi,
when running the CUDA profiler on code, can anything be said about the cost of profiling? I notice that a 20ms execution time becomes around 25-30ms with the profiler.
- Is this time cost added to either of the gpu_time or cpu_time fields in the profiler output?
(I imagine that it works like this: The GPU time is unaffected, but there’s a small delay after each kernel execution while the profiler data is gathered up.)