High idle times between kernel exeuction

in my application I’m using 8 different kernels which are called one after another. The problem is that I get long idle times (up to 5ms - according to visual profiler) between some kernel calls. I’ve deleted all the code in the kernels, so only the signature is the same and the kernels do nothing. The Host is only responsible for kernel calling, so there is no work on that side.
Tested on GTX 460 and GTX 570 with the newest drivers (296.10) => both resulted in the same behavior
I’ve added a screenshot of my visual profiler session (at that time kernels were not empty but the behavior is the same).

Can someone tell me what can cause this?

I hope that someone can help me.

I do a warmup iteration but it’s the same for all