I’m doing some basic task in the kernel where I used the blocks and threads configuration as blocks(8,1,1) and threads(4,8,16) respectively. I’m taking the GPU time with the help of cudaEvents and cudaEventElapsedTime function.
When I run the program then I got the time as 0.000123sec. But when I run the same program again and again, then I got the time as 0.000213sec, 0.000146sec, 0.000170sec, 0.000202sec, 0.000299sec, etc.
So, I’m confused about showing the correct GPU time taken for the execution of some task. Which time should be considered?
Thanks in advance