hi,
In a particular code, when I use CUDA timer and CPU timer, they give different results. For example,
cutCreateTimer(&gpu_timer);
cudaThreadSynchronize();
cutStartTimer(gpu_timer);
<<< Runkernels here >>>
cudaThreadSynchronize();
cutStopTimer(gpu_timer);
double gpu_time = cutGetTimerValue(gpu_timer) * 1e-3;
Now, I use CPU timers for the same thing
clock_t gpu_start = clock();
<<< Runkernels here >>>
double gpu_time = ((double)clock() - gpu_start) / (CLOCKS_PER_SEC);
The GPU Timer reports very less time (say) 10ms. While the CPU timer reports around 200ms.
Why is this happening ?
CUDA timers are much more precise than clock(). clock() has worse resolution and such timing will give incorrect results unless the time difference is in the order of seconds. I’ve also had issues with using clock() in a multithreaded app under Linux.
To get proper timings for short tasks use system-specific functions (gettimeofday on Linux, QueryPerformanceCounter on Windows) or a good timing library that wraps around them (like CUDA timers do AFAIK).
I guess the cutil timing function cutGetTimerValue() is just a wrap of QueryPerformanceCounter().
Is it?
If it works as NVIDIA’s shared utils for OpenCL (and it probably does) then it wraps around QueryPerformanceCounter() under Windows and gettimeofday() under Linux. Both of these system-specific functions give very high resolution host timing.