Comparing GPU vs CPU execution times


what’s the best method for registering execution times in order to comparing GPU vs CPU times?

Obviously GPU time is well-registered by calling cuda runtime functions, but it is possible to use these functions after and before CPU executions in order to register CPU times?

Any other ideas?


CUDA events are by far the most reliable benchmarking method, but they only work in CUDA streams. Your best bet for benchmarking CPU code is whatever high-resolution timer your OS offers.

I’ve had good luck with the cuda utility toolkit’s cut[Create|Start|Stop|Restart]Timer (queried with cutGetTimerValue()), which on Windows I believe maps to a queryPerformanceCounter API call, which is a bit slow on the reads but very, very precise (~0.5us). On linux, it looks like it maps to the gettimeofday() function, but I don’t know the precision of that counter. If you don’t want too much precision on Windows, the GetTickCount() API is very fast but only has a precision of ~15 ms in my experience.

Hope that helps!


In windows, I find queryPerformanceCounter() very useful. I have attached a header file.

You just need to do this in windows:

#include "PerformanceCounter.h"

HPTimer profiler;


..... code to be measured ...


printf("Time taken = %f\n", profiler.TimeInSeconds());

The timer is high profile timer and can measure even in microseconds…

Note that the time measured is physical time and might include interrupts, context-switches etc… So, Make sure no other process is running…

And, you need to run it atleast 5 times to get an average!
PerformanceCounter.h (587 Bytes)

Cannot download the PerformanceCounter header.