Multi-Process Service (MPS) is a beneficial intermediation layer between GPU-device and multi-processes host applications, like database management system which needs to handle unpredictable number of client concurrently.
As MPS documentation says (https://docs.nvidia.com/deploy/mps/index.html#topic_3_3_2), it does not support GPU kernel which internally use dynamic parallelism. If application tries to load a GPU kernel which internally use dynamic parallelism, cuModuleLoadData() returns CUDA_ERROR_NOT_SUPPORTED.
My application does not use dynamic GPU kernel invocation right now, however, MPS considers the cudaDeviceSynchronize() which synchronizes other concurrent kernels a kind of dynamic parallelism.
Therefore, we currently recommend users to disable MPS, even though it is helpful for database workloads if it would be supported.
I’m not sure why dynamic parallelism is prohibited on MPS.
If MPS has no or little technical hardness, I like to see the dynamic parallelism support soon.
As NVIDIA’s developer mentioned before, just a schedule constraints?