How to Enforce Per-Client Memory and SM Limits in CUDA MPS?

keval.shah1 · August 12, 2025, 11:51pm

I’m trying to enforce per-client resource limits in CUDA MPS but not seeing the expected behavior.

In my Kubernetes Pod spec, I set the following environment variables:

- name: CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
  value: "40"
- name: CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING
  value: "1"
- name: CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
  value: "0=40G"
- name: CUDA_MPS_CLIENT_PRIORITY
  value: "0"

However, when I check nvidia-smi pmon, the sm% is still close to 100%. Even after running:

echo "get_active_thread_percentage 7078" | nvidia-cuda-mps-control

I get 100.0. So the limits appear to not be applied.

What am I missing? Does MPS ignore SM limits set via CUDA_MPS_ACTIVE_THREAD_PERCENTAGE with per-context partitioning? Is CUDA_MPS_CLIENT_PRIORITY relevant here? How do I ensure each client only uses the intended amount of SM and memory?

rs277 · August 13, 2025, 1:27am

You may find this thread useful.

Topic		Replies	Views
Can CUDA MPS limit the GPU memory usage of a client process? CUDA Programming and Performance	1	774	May 7, 2020
MPS thread limit and 100% GPU usage CUDA Programming and Performance	7	459	August 14, 2025
Limiting GPU Resource Usage per Docker Container with MPS Daemon CUDA Programming and Performance	5	1566	September 4, 2024
MPS: Limiting threads to different thresholds for multi-GPU processes CUDA Programming and Performance tensorflow , kernel , ubuntu , python , linux	1	757	October 27, 2021
MPS set_default_active_thread_percentage not working as expected CUDA Programming and Performance	3	2184	November 23, 2021
Set_default_active_thread_percentage mps server limits memory too CUDA Programming and Performance	1	509	February 15, 2023
Improving MPS performance using Volta MPS Execution Resource Provisioning CUDA Programming and Performance	5	1453	July 4, 2019
MPS resource management CUDA Programming and Performance	1	640	February 8, 2024
Misunderstand about MPS non-uniform partitioning CUDA Programming and Performance cuda	0	44	March 28, 2025
Multi-Process Service Active Thread Percentage CUDA Programming and Performance	0	496	May 5, 2022

How to Enforce Per-Client Memory and SM Limits in CUDA MPS?

Related topics