How can I calculate the computing time in GPU?

I use a GTX295.
I want to compare the computing time between CPU and GPU for the same funciton. But the GPU time which I got always equal to zero.
Why? How should I do?

part of the codes is followed.

void myFunOnHost();
global void myFunOnDevice();

int main()
{
clock_t start,end;

start=clock();
myFunOnDevice<<<…>>>();
end=clock();
coutt<<(end-start)<<endl;

start=clock();
myFunOnHost();
end=clock();
coutt<<(end-start)<<endl;

}

by the way, is it right that CLOCKS_PER_SEC = 1000000?

Kernel launches are asynchronous (i.e. the code allows the CPU to continue while the GPU runs in the background). If you want to time how long your kernel takes, you need to call cudaThreadSynchronize() before measuring the end time.

Not sure about CLOCKS_PER_SEC. I use gettimeofday() for time measurements.

External Media

Thank you.

After adding cudaThreadSynchronize() before the end time, the GPU time that I got was still unstable. Sometime it gives a value bigger than zero. But mostly it is zero.

The CPU time is always same.

I guess this value may be not the real time. Do you know how to get the real value of GPU time?

Ah, reading the manual page for clock() suggests that it might not be the best measure, since it looks the amount of “processor time” rather than wallclock time. Try using gettimeofday(), just to see if that is a more reliable measure.

Ah, I use Event management calls to measure the time, which work pretty well. creating some event, recording the event and finding the elapsed time.

I found the following page helpful: [url=“High Resolution Timer”]http://www.songho.ca/misc/timer/timer.html[/url]

I’m currently using the QueryPerformanceFrequency function on my Windows box.

Just be aware that this function MAY behave erratically on multi-core CPUs. You need to nail your time-measuring thread to 1 core and then use this.

External Image

Thank everyone very much.

The gettimeofday() works very well. The result is very closed to that from cudaprofile.

https://www.cs.virginia.edu/~csadmin/wiki/i…_kernel_runtime

I found this link most useful, it uses CUDAs built in Time functions! External Image

Regards