Why is there so much oscillation in the running time of cuda kernels?
I’m experimenting two approaches for my implementation, and I can’t tell which one is better than the other one, because of high oscillation in run time.
I usually see 3 sec oscillation. Why?
Same here. I’ve timed sub-ms kernels with a running time fluctuation of only a few % without X running. With X running, there are occasional 10% blips when the GUI updates something, and even more if you start using the GUI.
Sounds like you aren’t timing correctly, as E.D. Riedijk said.
I do use the cudaThreadSynchronize() right after my kernel call. Then I copy the results to cpu and then stop the timer. When I run the kernel without X I get the same running time for some data size. But, when I increase my data size then I see the oscillations again even with no X.