Question about concurrent kernels with multithreading

Hi all,

See the image below, we spawn 5 threads to run different and independent cases in our application, we can see that many kernels are overlap and concentrated in red boxes, my doubt is why there have gaps between red boxes, how to explain this result?

OS : windows 10
Card : GTX 1080
Version : CUDA 10.1

Please excuse my English

http://i.imgur.com/IAj6b7k.jpg