Limiting GPU Resource Usage per Docker Container with MPS Daemon

I’ve been utilizing the MPS (Multi-Process Service) daemon to manage resource usage limits for processes using the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE and CUDA_MPS_PINNED_DEVICE_MEM_LIMIT environment variables, and it’s been working well. However, I’ve encountered a scenario that I’m not sure how to address. I’m curious if there’s a way to apply these limits collectively to an entire Docker container.

For example, if we set CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=0=1000MB in the container’s environment variables, launching two processes results in each having its own limit, effectively allowing them to use a total of 2000MB combined. Is there a mechanism or strategy to enforce the total limit across the entire container so that, in my case, two applications together cannot exceed the 1000MB limit?

Has anyone tackled this issue before, or is there a way to ensure that the collective limit applies to the whole Docker container, restricting the total resource usage to, for example, 1000MB as per my example?

Has nobody encountered this, tried to solve this problem? The behavior of MPS on resource limiting per process, without the ability to limit per container, renders pod orchestration in Kubernetes useless because we have no way to ensure that a pod will not consume more resources than allocated.

The only solution that came to my mind is to limit the number of PIDs inside the container to one, so that no more than one process can run inside the container. However, this is not an ideal solution because not every application can be adapted in such a way that it operates only through one main process. I would be grateful if someone could tell me where I can reach out to the MPS developers to ask this question.

Still waiting some answer from Nvidia team