what’s the best method for registering execution times in order to comparing GPU vs CPU times?
Obviously GPU time is well-registered by calling cuda runtime functions, but it is possible to use these functions after and before CPU executions in order to register CPU times?
CUDA events are by far the most reliable benchmarking method, but they only work in CUDA streams. Your best bet for benchmarking CPU code is whatever high-resolution timer your OS offers.
I’ve had good luck with the cuda utility toolkit’s cut[Create|Start|Stop|Restart]Timer (queried with cutGetTimerValue()), which on Windows I believe maps to a queryPerformanceCounter API call, which is a bit slow on the reads but very, very precise (~0.5us). On linux, it looks like it maps to the gettimeofday() function, but I don’t know the precision of that counter. If you don’t want too much precision on Windows, the GetTickCount() API is very fast but only has a precision of ~15 ms in my experience.