This is odd then. I’ve never seen this behavior before. Do you have any active program rendering to the display? Even it it is 2D? In my own project, kernels basically take the same time (say 1 ms each), but occasionally jump up to 1.2 ms or higher due to rendering the mouse cursor or something else on the screen, but nothing like the factor of 2 increase that you are seeing. If I grab and drag a window around the screen these times do get higher, though.
I assume you are running the kernel on the same dataset over and over again so that changing data will not change the run time. Could you run your program with the profiler activated? I’m curious to see if these increasing times show up in cuda_profile.log too. That would also tell us more about where the time is going: if it is actually taking up twice as much gputime at the end or it if is in driver overhead cputime.