Why the same kernel runs a different speed when invoke more than once?

hi, I am wondering is there is a reason why the same kernel launched with the same data takes 5 time more time to execute on even invocations. Below is a screen capture of the profiler, as you can see the secund time the same kernel is executed take precisely 5 time the amount of time than the first time to complete.

5 time seems excessive, I do not see any clue in teh profiler as to why is that?

Moved to Nsight Compute forum.

where is the link to nsight compute forum?

Looks like this never updated, moving this over now.