Limiting GPU Resource Usage per Docker Container with MPS Daemon

valafon · March 14, 2024, 7:26am

I’ve been utilizing the MPS (Multi-Process Service) daemon to manage resource usage limits for processes using the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE and CUDA_MPS_PINNED_DEVICE_MEM_LIMIT environment variables, and it’s been working well. However, I’ve encountered a scenario that I’m not sure how to address. I’m curious if there’s a way to apply these limits collectively to an entire Docker container.

For example, if we set CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=0=1000MB in the container’s environment variables, launching two processes results in each having its own limit, effectively allowing them to use a total of 2000MB combined. Is there a mechanism or strategy to enforce the total limit across the entire container so that, in my case, two applications together cannot exceed the 1000MB limit?

Has anyone tackled this issue before, or is there a way to ensure that the collective limit applies to the whole Docker container, restricting the total resource usage to, for example, 1000MB as per my example?

valafon · March 20, 2024, 3:46am

Has nobody encountered this, tried to solve this problem? The behavior of MPS on resource limiting per process, without the ability to limit per container, renders pod orchestration in Kubernetes useless because we have no way to ensure that a pod will not consume more resources than allocated.

valafon · March 26, 2024, 7:46am

The only solution that came to my mind is to limit the number of PIDs inside the container to one, so that no more than one process can run inside the container. However, this is not an ideal solution because not every application can be adapted in such a way that it operates only through one main process. I would be grateful if someone could tell me where I can reach out to the MPS developers to ask this question.

valafon · April 23, 2024, 6:46am

Still waiting some answer from Nvidia team

valafon · July 5, 2024, 4:08am

Up, need help from Nvidia team

valafon · September 4, 2024, 1:37pm

Up, still nothing

Topic		Replies	Views
What is the best way to partition the SM of a GPU? CUDA Programming and Performance hw , cuda , kernel	2	1093	August 17, 2023
Configuring multiple Volta MPS servers for execution resource provisioning CUDA Programming and Performance	2	1138	December 6, 2018
Question about CUDA MPS CUDA Programming and Performance	15	2826	August 22, 2022
Multiple MPS Daemons on one machine? CUDA Programming and Performance	0	351	June 26, 2019
Relation between MPS, Nsight Systems, CUDA Drivers and Singularity Containers CUDA Setup and Installation	0	501	November 11, 2021
MPS is not working CUDA Programming and Performance	7	3132	July 13, 2022
pre-volta MPS test failed with error: mapping of buffer object failed CUDA Programming and Performance	3	1182	June 13, 2019
Multi-Process freeze with docker CUDA Programming and Performance	1	883	August 31, 2023
MPS: Limiting threads to different thresholds for multi-GPU processes CUDA Programming and Performance tensorflow , kernel , ubuntu , python , linux	1	705	October 27, 2021
Docker pause leads to monopolizing GPU when Volta MPS on CUDA Programming and Performance cuda , docker	0	493	November 2, 2022

Limiting GPU Resource Usage per Docker Container with MPS Daemon

Related topics