Will it provide the correct kernel run time if it is measured as follow?
mykernel<<<1,1>>>(dev_a, dev_b, …);
Assume properly call of gettimeofday() function …
I measured durations in kernel using clock() function and got larger time value compared to above kernel time (which is wrong …!). What might cause this timing error?
I am using CUDA 4.0 on Fermi …