performance of preemption due to time-slicing

I am using GTX 1080, and according to https://stackoverflow.com/questions/34709749/how-do-i-use-nvidia-multi-process-service-mps-to-run-multiple-non-mpi-cuda-app, it will use job preemption and time-slicing to interleave multiple kernels. What is the performance hit due to this? I couldn’t find literature on this.

Thanks