I tried to execute the kernel in a for loop for 1000 times, put a clock() before the for loop, and a clock() after the for loop.
So 1000*kernel.exec.time = 2nd clock() - 1st clock() .
There is no random thing happening, so each run should be exact the same and produce exact the same result.
I run the above program repeatedly, and each time, the elapsed time may vary. Stangely, though they vary, seems certain value are more probable to happen. For example in my case, I observed 0.328 sec 20 times and 0.312 sec 23 times, while a few other times observed 0.343sec, 0.359sec…
Why is that?
Some previous post said kernel should be warm up before ‘real execution’. How to do ‘warm up’? I guess not arbitrary kernel funtion can do this warm up thing… right? So how to correctly measure exec kernel time ?
Visual profiler tells some number of GPU time, CPU time, how should I trust those value? For some kernel, I found the number from visual profiler match well with time measured by clock(), and some are offset by around 30%.