I use a GTX295.
I want to compare the computing time between CPU and GPU for the same funciton. But the GPU time which I got always equal to zero.
Why? How should I do?
Kernel launches are asynchronous (i.e. the code allows the CPU to continue while the GPU runs in the background). If you want to time how long your kernel takes, you need to call cudaThreadSynchronize() before measuring the end time.
Not sure about CLOCKS_PER_SEC. I use gettimeofday() for time measurements.
After adding cudaThreadSynchronize() before the end time, the GPU time that I got was still unstable. Sometime it gives a value bigger than zero. But mostly it is zero.
The CPU time is always same.
I guess this value may be not the real time. Do you know how to get the real value of GPU time?
Ah, reading the manual page for clock() suggests that it might not be the best measure, since it looks the amount of “processor time” rather than wallclock time. Try using gettimeofday(), just to see if that is a more reliable measure.