I’m quite new to CUDA and I need help on what is correct way of measuring kernel execution time.
I have used CUDA timers and CUDA events for measuring time and got quite different results.
<b>TIMER CODE:</b> cutStartTimer(hTimer); KERNEL <<< >>> cudaeThreadSynchronize(); cutStopTimer(hTimer); elapsedtime = cutGetTimerValue(hTimer); printf( "Processing time: %.3f ms\n", elapsedtime);
<b>EVENT CODE:</b> cudaEventRecord(start, 0); KERNEL <<< >>> cudaEventRecord(stop, 0); cudaEventSynchronize(stop); cudaEventElapsedTime(&elapsedtime, start, stop); printf( "Processing time: %.3f ms\n", elapsedtime);
I have executed code on two cards : a) TESLA C1060 b) GTX 470
GTX 470 results
For TIMER approach I get 0.148 ms
For EVENT approach I get 0.007 ms
TESLA C1060 results
For TIMER approach I get 0.088 ms
For EVENT approach I get 0.083 ms
I see CUDA timers on Windows as plain HighPerformance timing stamped with HOST time while CUDA events are timestamped at GPU side. THerefore events display actual execution time spent on GPU. Is that right.
What bothers me is this crazy difference between two approaches for GTX 470 card. Is it possible to be such huge difference ? If it is than I would kindly ask someone to explain it to me.
On the other hand C1060 results are almost identical which is expectable and OK.
Can you please clarify this to me ?