Clock() function how to calculate the processing time


here is an easy question ;-) I just wanted to make sure I got the clock() function thing right.
If I wanna know how much time the GPU spent processing some data, I surround the code by two of these clock functions, take the difference and get the number of cycles used. Right?

This number of cycles represent the number of cycles a multi-processor went through between the start and the end of the lines of code. It represents the number of cycles actually dedicated to this code + any scheduling overhead + other possible threads (warps) that were scheduled on the multiprocessor in alternance with this thread. Right?

As these are GPU cycles, and the 8800GTX hot clock frequency being 1.350MHz, it means that I need clock_difference/1350 microseconds to execute my code, right?

Isn’t there a function to know how many cycles were dedicated to only this thread?

Thanks for confirmation and help ;-)


Sorry, but that is all wrong. :no:

clock() reports the current ALU clock pin counter. So it runs at ALU speed, not multiprocessor speed. Because it just reports the current count, it is subject to any processing delay a thread can experience because of time slicing, scheduling, mem latency etc. There is also interference with __syncthreads(). You can query the ALU speed in the 0.9 toolkit or newer.

To use it, you need to subtract the first time stamp from the second and consider this to be wall clock time, not processing time. That is, you must record and seek the first start time and the last stop time of a block to get its total processing time. There has been a thread on this forum, explaining it a bit more. Just search for it.


Thanks a lot for your reply Peter.
I tried to find that thread giving more details about how to use clock(), maybe the one of osiris and his experiments… not sure.

To get the ALU clock speed, you propose to use toolkit 0.9, but as far as I know it’s not out yet right now. Do you happen to know ALU clock speed for 8800 GTX?

If using cutCreateTimer() function, can I get as precise results as with clock()? What’s the difference between both?



Mark was talking about the clock domains here.

The 0.9 toolkit reports 1350000 kilohertz on a 8800 GTX.

The cutCreateTimer is a host function, as all the util functions are. This timer will measure wall clock time on the CPU. So you can time CUDA calls with it. You cannot measure anything on the GPU with it.


Great, thanks for your prompt replies Peter.