Kernel invocation cputime overhead?

jlehtone · April 18, 2008, 6:16am

First venture into CUDA (1.1; Linux). Still naive. (a) Does not crash, (b) results are correct, and © occasionally faster than CPU. But those aside, I was looking at the profile log. Manual says that cputime includes the gputime of the kernel. But what else does it include? Preceding mallocs?memcopies? Or more importantly, is there anything that I could do in order to squeeze the “overhead”?

Curiously, the first call to “F3” (and “F4”) has more overhead than later calls. Data/block/thread size is same on those calls.

method=[ memcopy ] gputime=[ 3.776 ]

method=[ F1 ] gputime=[ 64.000 ] cputime=[ 124.000 ] occupancy=[ 1.000 ]

method=[ F2 ] gputime=[ 7.936 ] cputime=[ 64.000 ] occupancy=[ 1.000 ]

method=[ memcopy ] gputime=[ 2.944 ]

method=[ memcopy ] gputime=[ 3.712 ]

method=[ F1 ] gputime=[ 49.376 ] cputime=[ 100.000 ] occupancy=[ 1.000 ]

method=[ F2 ] gputime=[ 7.264 ] cputime=[ 52.000 ] occupancy=[ 1.000 ]

method=[ memcopy ] gputime=[ 2.816 ]

method=[ memcopy ] gputime=[ 4.480 ]

method=[ memcopy ] gputime=[ 3.872 ]

method=[ F3 ] gputime=[ 62.368 ] cputime=[ 113.000 ] occupancy=[ 0.667 ]

method=[ memcopy ] gputime=[ 38.272 ]

method=[ memcopy ] gputime=[ 3.488 ]

method=[ memcopy ] gputime=[ 3.808 ]

method=[ memcopy ] gputime=[ 3.680 ]

method=[ memcopy ] gputime=[ 2.880 ]

method=[ F4 ] gputime=[ 3.232 ] cputime=[ 52.000 ] occupancy=[ 1.000 ]

method=[ F3 ] gputime=[ 62.752 ] cputime=[ 84.000 ] occupancy=[ 0.667 ]

method=[ memcopy ] gputime=[ 37.760 ]

method=[ memcopy ] gputime=[ 3.360 ]

method=[ memcopy ] gputime=[ 3.072 ]

method=[ F4 ] gputime=[ 2.880 ] cputime=[ 26.000 ] occupancy=[ 1.000 ]

method=[ F3 ] gputime=[ 60.448 ] cputime=[ 84.000 ] occupancy=[ 0.667 ]

...

MisterAnderson42 · April 18, 2008, 1:14pm

Your overheads are typical. Presumably, the cputime overhead includes setting up the argument list, the grid dimensions and passing them to the kernel over PCIe. I’ve noticed in my own testing that binding a texture before a kernel call increases the cputime by another ~40us, so everything associated with binding textures is also part of that.

This overhead is unfortunate, but stays relatively constant as you increase the problem size on the GPU: Even when the GPU takes milliseconds to complete, the cputime overhead is still only microseconds.

Topic		Replies	Views
Profiler Times just need some info CUDA Programming and Performance	4	4604	June 16, 2010
cuda overhead CUDA Programming and Performance	1	2581	May 29, 2009
CUDA Profiler CUDA Programming and Performance	3	7009	September 21, 2007
Visual Profiler: CPU Time? CUDA Programming and Performance	5	3497	March 21, 2008
kernel call overhead: timing results overhead is large for small # of calls CUDA Programming and Performance	16	8014	March 8, 2013
startup overhead CUDA Programming and Performance	1	1965	May 22, 2009
What do you understand by CPU time? CPU time, computational load, cuda prof CUDA Programming and Performance	8	2554	July 11, 2008
cuda visual profiler CUDA Programming and Performance	12	8333	July 30, 2008
What is GPU&CPU time in profiler? instrumentation overhead included? CUDA Programming and Performance	0	1149	September 19, 2008
overhead between two successive kernel calls CUDA Programming and Performance	6	1851	July 7, 2013

Kernel invocation cputime overhead?

Related topics