performance of preemption due to time-slicing

I am using GTX 1080, and according to [url]gpu - How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications? - Stack Overflow, it will use job preemption and time-slicing to interleave multiple kernels. What is the performance hit due to this? I couldn’t find literature on this.

Thanks