Some of the cuda sample code provide a simple performance benchmarking mechanism, using cudaEventRecord before and after kernel invoke. I noticed that, in all of the code, there’s a kernel invoke before the cudaEventRecord(start), and the comment of this line says “warmup to avoid timing startup”. What does this mean?
It just means that there is a ‘one time’ per application start up cost to set up the CUDA context. This can vary from 100-250 ms and after that point accurate timing for kernels can begin.
Thank you. Very clearly explained.