Timing variations

Hello,

I’m using cuda fortran to parallelize a CFD code using cublas and cusparse libraries along with my own kernels.
I was measuring the time of a subroutine that calls other subroutines and my own kernel.
The strange fact is that when I run the program, it may give an average time of ~60 ms for my kernel and ~1500 ms for the whole subroutine I’m measuring. If I stop the program and start again, it gives an average of ~80 ms for my kernel and ~1800 ms for the whole subroutine. If I stop and start again, I get back to ~60 ms and ~1500 ms, and so on. It’s strange but reproducible.

Is there any explanation for this behavior and a way to control it? This means variations of 10% in the overall program time without touching the code…