Job timing in CUDA

hi everybody,

Is there a reliable function in cuda that will record the total time for a job to run? If it’s more than ‘beginner’ difficulty to implement, application tips would be helpful.

as a secondary challenge, I will be running FORTRAN code utimately, so if anyone has experience with timing/clocks in fortran/C mixed that would be great. thanks,

um hello I am also running in a linux environment, and i just remembered that i can just do $ time ./exec command. is that effective enough as a profiler? i just need .1 second accuracy

cuda timers are included in cutil.h
check out the exemples in the SDK, theyre often used.

Its suposedly not of the greatest precision though.

Here is a snippet of code.


cudaEvent_t start, stop;

float time1;

// Create the timers

CUDA_SAFE_CALL( cudaEventCreate(&start) );

CUDA_SAFE_CALL( cudaEventCreate(&stop) );

CUDA_SAFE_CALL( cudaEventRecord(start, 0) );

// Thing to time…

CUDA_SAFE_CALL( cudaEventRecord(stop, 0) );

CUDA_SAFE_CALL( cudaEventSynchronize(stop) );

CUDA_SAFE_CALL( cudaEventElapsedTime(&time1, start, stop) );

For timing within Fortran, GPTL is hard to beat. You can either let it auto-instrument your functions (still requires a couple setup calls in your main program), or setup timing entries as such:

ret = gptlstart(‘computation1’)
do …
ret = gptlstop(‘computation1’)

You’ll get an entry in your GPTL output file telling you what happened between the start and stop calls. If you have PAPI installed, life gets even better (though PAPI can be more difficult to install).

Just remember:

  1. Add a cudaThreadSynchronize before the end of your timer if you’re executing asynchronous stuff (kernel calls and async copies).
  2. By default, GPTL uses low precision timers. You’ll need to manually specify a different one (if necessary).

Most compilers these days have OpenMP flags as well. The timer there (omp_get_wtime()) is generally pretty good.


sweet just what i needed