How to compute time in cuda?

I just wonder how to compute time in Cuda…

I have a small test, the code is like this

…call kernel_function…

then I can not get the correct time when I use t2-t1, because I always get a tiny number no matter how I change the iteration_time… it is the same case when I use cutStartTimer and so forth…

I think it is because after the CPU calls the GPU function, it comes back without waiting for the result from GPU. Someone has the same case as me ?



  1. Cross-posting won’t get you an answer any faster
  2. clock() is a very low resolution timer, use gettimeofday
  3. You are correct that kernel invocations are asynchronous. Read the programming guide. You can call cudaThreadSynchronize() to wait for all previous kernel calls to complete for timing purposes.