Timers not timing...

s.lemieux · August 4, 2008, 7:41pm

I’m having a kernel launch and associated memory transfers between a cutTimerStart / cutTimerStop pair, using cutGetTimerValue to obtain the time spent running the algo in ms. When I scale up the size of the problem, I systematically get back around 124 to 141 ms it is not proportional to the size of the problem. On the other hand, the wall clock time spent does increase linearly (as far as I can say!) with the size of the problem.

Do I need to compile with dbg=1 to get proper timing? Is there anything I need to do to use the timers beside cutCreateTimer?

tmurray · August 4, 2008, 7:55pm

add a cudaThreadSynchronize() after the kernel call if you haven’t done that already.

ideally, use streams and events for timing, as they will be much more accurate for short kernels than CPU-based timers.

s.lemieux · August 4, 2008, 8:24pm

Thanks for the tip, cudaThreadSynchronize() did the job.

Now I’m wondering how many rookies like myself got lured into thinking their code was running that much faster on the GPU vs. a CPU implementation. Wouldn’t it be safer if cutStopTimer would first call for synchronization before taking the time?

I’ll look up streams and event!

cbuchner1 · August 4, 2008, 8:55pm

If it makes you happier: I was - that is kind of embarrassing actually ;)

dam5h · November 8, 2008, 5:21pm

unfortunately this did not solve the problem for me. I am running the test within a loop that varies the array sizes that I am operating simple math on. The first couple tests give realistic values for flops. Once the arrays surpass 10 million elements the flops values are way off. The test is obviously taking much longer as I am waiting longer on the ouput but the times returned are much lower than the faster earlier tests. Is there a time limit on the timer where it will restart at 0, that could explain it. Otherwise I’m pretty confused.

Thanks,
Dave

dam5h · November 8, 2008, 5:34pm

I just now tried using events to record the kernel time as in the simple streams example and am getting the unrealistic FLOP values once again on the longer arrays. I am clueless at this point, any help would be greatly appreciated.

Dave

dam5h · November 8, 2008, 7:34pm

I think I discovered my problem. The grid size that I was submitting to the kernel was too large for the tests giving me unrealistic times. Odd though that I still had to wait longer for it to do something, perhaps the memcpy was what I was waiting for. If I did want to process a large 1D array that would exceed the max grid size, is it best to simply split it up or is there a better alternative using hone memcpy but multiple kernel invocations?

Thanks again,
Dave

Ailleur · November 8, 2008, 8:15pm

You could just launch a 2D grid and treat it as a big 1D one.

The same way you would access a 1D array* with array[y*width+i]

Have your thread index be something like

const int idx = (blockIdx.y*blockDim.x*gridDim.x)+blockIdx.x*blockDim.x+thre

adIdx.x;

Topic		Replies	Views
Timing CUDA Code To find the best way to time CUDA code CUDA Programming and Performance	5	2088	January 6, 2009
Events vs Timers - big differences measurung kernel execution time CUDA Programming and Performance	7	2275	December 21, 2010
timing problem timing prob in matrix multiplication CUDA Programming and Performance	3	1675	February 4, 2010
Timing the Kernel CUDA Programming and Performance	3	3797	January 15, 2010
timing and the profiler getting different results from each CUDA Programming and Performance	10	1928	February 3, 2010
Timing cuda code I'm sorry for small for dÃ©ja-vu :-) CUDA Programming and Performance	12	36151	July 12, 2011
GPU cimputing time problem cutGetTimerValue bug? CUDA Programming and Performance	1	1063	March 11, 2011
How properly counting a performance/program time ? CUDA Programming and Performance	4	2653	August 28, 2007
How to get the kernel computation time? CUDA Programming and Performance	2	4560	July 1, 2008
clock() doesn't work properly CUDA Programming and Performance	10	6458	July 3, 2009

Timers not timing...

Related topics