Ran into the situation where an application timer uses “gettimeofday” for timing both CUDA kernels and the time needed to copy memory back and forth from CPU to GPU.
This is the function used;
Looking at the results and comments of other users for this site I suspect that this timing method is not accurate.
In Windows I generally use the OS timing function in addition to the timing results from nvprof. It seems to me that the results from nvprof are the most accurate, but this is just speculation and I would like some feedback on a method which will really stand up to any critique.
This is a situation where milliseconds matter and the burden of proof is upon me to show the site operators that the “gettimeofday” may be missing some timed related to memory operations and that nvprof is a more accurate method which is less likely to be manipulated.
In my experience the Windows timer matches the times in nvprof, but for my results I always use the nvprof times because I assume the NVIDIA got it right.