CUDA-Kernel time measurement

Hello,

what is the best possibility for the time measurement of CUDA-Kernels?

I use C++ - Timer and cudaThreadSynchronize() , but is it the best possibility?

cudaThreadSynchronize()

timer.start();

kernel(...);

cudaThreadSynchronize()

timer.stop();

There are the cudaEvents

Here’s a minimal example that makes use of it…

float memsettime;

cudaEvent_t start,stop;

cudaEventCreate(&start);

cudaEventCreate(&stop);

cudaEventRecord(start,0);

\

\cuda coda here

\

cudaEventRecord(stop,0);

cudaThreadSynchronize();

cudaEventElapsedTime(&memsettime, start, stop);

cudaEventDestroy(start);

cudaEventDestroy(stop);

I assume this is also reliable in timing non-CUDA code, right?

It depends I guess. If you have a CPU multi threaded program, cudaThreadSynchronize will do nothing synchronization wise so you would have to additionally include your own barrier. Other than that I guess it would work, although the phrase killing a fly with a bazooka comes to mind. (given the additional overhead of having to synchronize the cudaEvents’ timings).

Thank you!!!