Feature Request: Dynamic parallelism support under MPS

Multi-Process Service (MPS) is a beneficial intermediation layer between GPU-device and multi-processes host applications, like database management system which needs to handle unpredictable number of client concurrently.

As MPS documentation says (https://docs.nvidia.com/deploy/mps/index.html#topic_3_3_2), it does not support GPU kernel which internally use dynamic parallelism. If application tries to load a GPU kernel which internally use dynamic parallelism, cuModuleLoadData() returns CUDA_ERROR_NOT_SUPPORTED.

My application does not use dynamic GPU kernel invocation right now, however, MPS considers the cudaDeviceSynchronize() which synchronizes other concurrent kernels a kind of dynamic parallelism.
Therefore, we currently recommend users to disable MPS, even though it is helpful for database workloads if it would be supported.

I’m not sure why dynamic parallelism is prohibited on MPS.

If MPS has no or little technical hardness, I like to see the dynamic parallelism support soon.

As NVIDIA’s developer mentioned before, just a schedule constraints?
https://devblogs.nvidia.com/inside-volta/#comment-3304407376

file your request at developer.nvidia.com

You’ll need to be a registered developer first.

Then use the bug reporting form to request this feature. Mention that this is a request for enhancement (RFE)

It is BUG-ID: 2172828, already filed.

When we escalated the request for enhancement to NVIDIA via our channel partner,
their counter-person also recommend us to make a discussion topic on the developer forum.