Does CUDA5.5 MPS supports concurrent executions of kernels from different processes?

What does this multi-process really mean in the level of kernel concurrency? Does it mean, like before, those kernels are serialized? Or they are time sliced? Or it’s possible that, at a time point, the GPU simultaneously runs thread blocks from those kernels?

Theoretically only if it is compute capability 3.5 (Titan and Tesla k20 device) by using HyperQ.

Is there some document to explain the details?

Check the example from SDK,d.Yms