Hello scientists :-),
I have a short question related to timing CUDA code. I’ve read a lot of topics about timing on this forum, two of them are here (just for reference):
But in fact, I was not able to find definitive answer to following question:
Is it better to use host-based timing with cudaThreadSynchronize() or CUDA events are preferred?
- single device - 8800GTX
- step by step operations directed to STREAM 0, no concurrency, no multiple streams
- my task is to compare CPU vs. GPU timings of some equivalent pieces of code
- I’m timing single kernel executions as well as whole batches of successive kernel launches.
Which of two mentioned timing methods would you advice to me and why? I’m using the first one actually (cudaThreadSynchronize and host-based timing using clock_gettime(CLOCK_REALTIME, timer) on Linux based machines). I would appreciate to view your advice or own experience. Thank you.