I have written a parallel cyclic reduction code in CUDA, and a Thomas algorithm code in C++. What would be the most appropriate way for me to compare the execution time of both codes? I am timing the CUDA code using cutil, but I am not too sure if using clock.h to time for C++ code is a good way to compare the codes.
Using profilers is good ( ex cuda visual profiler and the profiler in visual studio team suite if you’re working with windows… ).
What timer to use also depends on total execution time and the precision that you are after. For example the cutil timers aren’t very accurate unless the timing is over a few milliseconds.
So if your execution time is very short i would definetly go for the profilers.