Mesuring Kernel Performance

shiftreduce · September 28, 2009, 2:09pm

After finishing my kernel (and all the stuff that supports it) I need to evaluate my its performance by doing a few benchmarks and I was wondering what method do you recommend.

Currently I’m just taking the time the kernel ends minus the time it starts in milliseconds directly in the C++ code. I’m not sure though, that this is the best method. I tried with CUDA Prof, but as far as I can tell, it only gives the timestamps where each CUDA Operation started…

What would you suggest as the best option?

E.D_Riedijk · September 28, 2009, 5:43pm

CudaEventRecord(), there are numerous examples in the SDK to use it to do timing of kernel execution.

shiftreduce · September 28, 2009, 11:07pm

But are Events the best way to evaluate the overall performance? I mean, right now what I want is to know how long (average) it takes to perform the following code:

copyHostToDevice(host_stream_in, device_stream_in);

kernel<<<1,1>>>(device_stream_in, device_stream_out);

copyDeviceToHost(device_stream_out, host_stream_out);

My doubt is if I should use Events or Wall-Clock time, as although I’m measuring the kernel’s performance, I still need to compare my solution to some similar CPU based one.

Quoc_Vinh · September 29, 2009, 3:48am

I usually using CUDA profiler to measuring executed time. In the reports (*.csv) file, you need to pay attention at the tree columns “timestamp”, “gputime” and “cputime” (microsecond).