timing kernel execution with clock()

ndv14 · June 30, 2009, 4:54pm

Hi,

I’m trying to time the execution of some kernels as accurately as possible on a C1060. I’ve used gettimeofday() and cudaEvents, which seem to be reasonably accurate, but I want to measure the execution of the kernels without any overhead of the kernel call, so I’ve turned to using clock() on the GPU.

This is my thinking so far: Since I want to eliminate all overhead of the kernel launch, I need to call clock() on the GPU before and after the kernel execution, but global functions have to be void, so in order to pass the number of clock ticks back to the CPU, I need to pass a pointer to some host address and then do a cudaMemcpy.

I have written this so far to where it seems like it should (or could possibly) work correctly, but the tick counts I am getting are obviously wrong. The variable I’m using to contain the tick count is of the clock_t type. Is it possible when I find the starting and ending clock ticks that the latter has wrapped around 0, since the tick count is so large?

This is the skeleton of my code:

[codebox]global void my_kernel(clock_t *ticks)

{

*ticks = clock();

/* do some useful stuff */

*ticks = clock() - *ticks;

}

int main()

{

/* setup memory for timing */

clock_t *h_ticks,*d_ticks;

h_ticks = (clock_t*)malloc(sizeof(clock_t));

cudaMalloc((void**)&d_ticks, sizeof(clock_t));

/* call kernel and record time */

set_array<<<dimGrid,dimBlock>>>(d_ticks);

cudaThreadSynchronize();

cudaMemcpy(h_ticks,d_ticks,sizeof(clock_t),cudaMemcpyDeviceT

oHost);

times[k] = (double)(*h_ticks/CLOCKS_PER_SEC);

}[/codebox]

The times I’m getting are widely varied and some are negative, which leads me to believe that the tick count is wrapping. Could someone please tell me what I’m doing wrong?

Thanks

MisterAnderson42 · June 30, 2009, 4:59pm

See the CUDA SDK clock sample. It has a list of all the caveats you need to be aware of. One obvious issue in the code you posted is that you have a race condition there with every thread reading and writing *ticks simultaneously.

Regardless of clock(): As far as I know, if you use a CUDA event to measure a single kernel execution, then you are measuring exactly the kernel execution with no overhead. To verify, you could run your code through the profiler and compare the gputime and cputime values of the kernel launches with the cuda event readings.

ndv14 · July 6, 2009, 3:51pm

Thanks for your reply. I redid the timing portion of my code, basing it off the clock example from the SDK, and now I am getting much more realistic values, but I still have a concern. From what I have read on the forum, cudaEvent should be very accurate in timing the execution of code on the GPU, but the times I am getting from it are several hundred times longer than those from clock() and even quite a bit longer than those from gettimeofday().

Even though the clock() function gives me satisfactory accuracy, the cudaEvents would be much more convenient to use. Why does cudaEvent not seem to give me good results?

BeachHut · July 6, 2009, 4:08pm

I think clock() returns values in seconds, but the cudaEvent* functions return time in milliseconds, which would make the result seem 100 times bigger.

ndv14 · July 6, 2009, 4:50pm

Okay, thanks. That was a possibility I was hoping for, but I couldn’t find any information on it. Could you point me to a reference which confirms that?

MisterAnderson42 · July 6, 2009, 5:15pm

Try “man clock”. It actually returns a number where CLOCKS_PER_SEC == one second. And the CUDA programming guide or reference manual will tell you that events measure in milliseconds.

BeachHut · July 6, 2009, 5:17pm

Sorry, it would actually be 1000 times bigger.

For the cudaEvent one being in milliseconds, just look at the Cuda Reference Manual.

clock() isn’t mentioned in the Cuda Reference Manual. It’s most likely equivalent to the standard C clock() function, which (see man page) gives seconds.

Topic		Replies	Views
clock() doesn't work properly CUDA Programming and Performance	10	6446	July 3, 2009
Number of GPU clock cycles CUDA Programming and Performance	15	10787	June 16, 2017
Timing using cudaEvent****() VS clock_gettime() CUDA Programming and Performance	6	2260	August 26, 2015
Compare Execution Times CPU vs GPU the proper way? CUDA Programming and Performance	5	6257	September 8, 2009
Timing cuda code I'm sorry for small for dÃ©ja-vu :-) CUDA Programming and Performance	12	36121	July 12, 2011
time measurement discrepancy timer, clock(), profiling CUDA Programming and Performance	4	6786	April 7, 2010
Measure the kernel duration ... CUDA Programming and Performance	2	740	May 7, 2013
Timing cudaEventRecord() ok for cpu timing? CUDA Programming and Performance	2	7716	August 14, 2009
How to correctly measure kernel exec time? CUDA Programming and Performance	2	3141	March 19, 2008
Events vs Timers - big differences measurung kernel execution time CUDA Programming and Performance	0	3851	December 20, 2010

timing kernel execution with clock()

Related topics