After finishing my kernel (and all the stuff that supports it) I need to evaluate my its performance by doing a few benchmarks and I was wondering what method do you recommend.
Currently I’m just taking the time the kernel ends minus the time it starts in milliseconds directly in the C++ code. I’m not sure though, that this is the best method. I tried with CUDA Prof, but as far as I can tell, it only gives the timestamps where each CUDA Operation started…
But are Events the best way to evaluate the overall performance? I mean, right now what I want is to know how long (average) it takes to perform the following code:
My doubt is if I should use Events or Wall-Clock time, as although I’m measuring the kernel’s performance, I still need to compare my solution to some similar CPU based one.
I usually using CUDA profiler to measuring executed time. In the reports (*.csv) file, you need to pay attention at the tree columns “timestamp”, “gputime” and “cputime” (microsecond).