Measurement of execution time

How do I measure the execution time of CUDA program? I am using time command in linux, but is there a better way to measure the execution time? (especially if I want to measure how long a particular function takes to execute)

All SDK examples measure time of execution of the kernel. You should be able to use the same functions cutCreateTimer, cutStopTimer and cutGetTimerValue for measuring the execution time.

The CUDA timers are fine for the most part. If you really want high resolution though, take a look at [url=“clock_gettime(3): clock/time functions - Linux man page”]http://linux.die.net/man/3/clock_gettime[/url]

yes, and you must include <cutil.h> in your header if it isn’t

If you want really a really high resolution timing of the kernel execution time, use the event API (timer resolution is the GPU clock).

But what you do is all dependent on what you want to achieve. Previous forum threads have covered this in much more detail, but here is the executive summary. If you want to know precisely how fast a single kernel launch is in order to tune just that kernel, then use the event API to measure its time. If you want a total running time to compare to another version of the code or to provide a practical benchmark of how fast the overall system is, then measure the wall clock time it takes to run you whole program on a test data set.