Kernel Overhead/Profiler Accuracy

chris22 · May 25, 2008, 4:45am

I have a kernel that takes that takes ~4ms to execute based on using the QueryPerformanceCounters and QueryPerformanceFrequency methods in Windows. According to CUDA’s visual profiler, the GPU execution time is 70us. Thus, can kernels really have overhead in the ms range or is there a problem with the profiler?

E.D_Riedijk · May 25, 2008, 5:52am

version 0.2 of the visual profiler also shows the CPU time, does that value also differ so much to the queryperformancecounters?

For me the output of the visual profiler has been very stable over time, so I have the feeling that the output can be trusted.

chris22 · May 25, 2008, 11:45am

Well, if you give Nvidia the benefit of the doubt on the profiler, the next question is why do some kernels have an overhead of ~40us and others have an overhead of ~4ms. What could possibly increase the CPU latency and overhead two orders of magnitude for that kernel beyond normal?

MisterAnderson42 · May 25, 2008, 1:31pm

Do you have other threads or background tasks running? When profiling, there is an implicit thread synchronize after every kernel call which spin-waits with a thread yield (at least in CUDA 1.1, I’m not sure if this changed in 2.0 beta). If other threads are vying for CPU time, this can introduce a significant delay.

Although, 4ms is a little bit excessive. I’ve personally never seen that kind of kernel launch overhead.

chris22 · May 25, 2008, 9:28pm

I had a cudaThreadSynchronize() after the kernel invocation to prevent the QueryPerformanceCounter from executing prior to the end of the kernel. That, unlike cudaStreamQuery yields the processor. Now, that I see what the problem is, it should have been really obvious to me.

Topic		Replies	Views
Profiler v. cudaEventSynchronize CUDA Programming and Performance	6	8140	March 27, 2008
Visual Profiler: CPU Time? CUDA Programming and Performance	5	3431	March 21, 2008
What is GPU&CPU time in profiler? instrumentation overhead included? CUDA Programming and Performance	0	1111	September 19, 2008
Performance measurement CUDA Programming and Performance	3	642	April 29, 2011
Kernel enqueue overhead Bringing kernel overhead down? CUDA Programming and Performance	9	13742	March 12, 2010
visual profiler concurrent kernels and kernel duration CUDA Programming and Performance	3	605	June 8, 2018
Profiler speeding up my kernels? Nvidia employees please read Weird timing behavior during profiler CUDA Programming and Performance	6	5818	November 9, 2009
Interpreting OpenCL Visual Profiler Results CUDA Programming and Performance	4	2242	June 10, 2010
CUDA Profiler Cost? How much time is added and where? CUDA Programming and Performance	1	2922	May 7, 2009
Is The cuda profiler accurate? CUDA Programming and Performance	1	2194	August 9, 2008

Kernel Overhead/Profiler Accuracy

Related topics