how to compute time in cuda?

I just wonder how to compute time in Cuda…

I have a small test, the code is like this

t1=clock();
for(i=0;i<iteration_time;i++)
{
…call kernel_function…
}
t2=clock();

then I can not get the correct time when I use t2-t1, because I always get a tiny number no matter how I change the iteration_time… it is the same case when I use cutStartTimer and so forth…

I think it is because after the CPU calls the GPU function, it comes back without waiting for the result from GPU. Someone has the same case as me ?

Thanks!

Shuai

call cudaThreadSynchronize() before you start timing (to make sure that all previous CUDA tasks have completed). Call cudaThreadSynchronize() right before the second timing call (to make sure that all the tasks you’re timing have completed).

Paulius

If I call two kernel functions one by one without using cudaThreadSynchronize(), will these two kernel functions run in the device one after one, or will they run concurrently on the device?

For example:

global myKernel1() {…};

global myKernel2() {…};

myKernel1<<<dimGrid,dimBlock>>>();

myKernel2<<<dimGrid,dimBlock>>>();

// NB: no cudaThreadSYnchronize()

Will myKernel1 and my Kernel2 concurrently running on the device(suppose they both take long enough time)? Or is there some kind of queue that store myKernel2 until myKernel1 is finished and then lauch myKernel2?

Timtimac.

[q]these two kernel functions run in the device one after one, or will they run concurrently on the device[/q]
This has been discussed many time, why don’t you search for an answer?
I short: they will NOT run concurrently. myKernel2 will be launched ony after myKernel1 has completed.