I’ve been utilizing the MPS (Multi-Process Service) daemon to manage resource usage limits for processes using the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
and CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
environment variables, and it’s been working well. However, I’ve encountered a scenario that I’m not sure how to address. I’m curious if there’s a way to apply these limits collectively to an entire Docker container.
For example, if we set CUDA_MPS_PINNED_DEVICE_MEM_LIMIT=0=1000MB
in the container’s environment variables, launching two processes results in each having its own limit, effectively allowing them to use a total of 2000MB combined. Is there a mechanism or strategy to enforce the total limit across the entire container so that, in my case, two applications together cannot exceed the 1000MB limit?
Has anyone tackled this issue before, or is there a way to ensure that the collective limit applies to the whole Docker container, restricting the total resource usage to, for example, 1000MB as per my example?