GPGPU Time Measurement


is there a platform independend way to measure the cpu-time, that all the asynchronous and blocking cuda/opencl functions take?

clock() is not monotonic, so a blocking cudaThreadSynchronize will not be measured correctly, or am i wrong?

note: i know how to use gpu timers and events, but i want to measure the overall execution time, since some people are not amused when the copy & kernel launches take 10ms, but the cpu takes 500ms to map and unmap buffers and wait for blocking synchronization calls ;)

I am certain clock() won’t account for cudaThreadSynchronize.

On your first question, I actually use CUDA events to time both CPU and CUDA codes. I compared the result on CPU timing with CUDA events and compared against standard cpu timer - clock() and the timings seems to be almost the same. So isn’t this the right thing to do, for timing? On OPENCL timing, I got no clue.

Actually not, since both functions don’t measure the real time…

Imagine: your timer says your function takes 2 seconds, and you start it 100000x in a row…wouldn’t you be surprised if this takes one week instead of 2.x days?