Why the same kernel runs a different speed when invoke more than once?

hi, I am wondering is there is a reason why the same kernel launched with the same data takes 5 time more time to execute on even invocations. Below is a screen capture of the profiler, as you can see the secund time the same kernel is executed take precisely 5 time the amount of time than the first time to complete.

5 time seems excessive, I do not see any clue in teh profiler as to why is that?

Moved to Nsight Compute forum.