MPS resource management

woosungkang · February 6, 2024, 4:27am

Hi guys,

I got some questions about the MPS resource mangement.

First of all, How does MPS allocate the GPU resources to mutliple clients?
Is this done by allocating specific SMs to the each kernel without overlapping?

If so, second question is how MPS allocate specific SMs to each kernel?
I found some information about TMD mask which indicating which SMs should be involved in kernel.
Does MPS utilize this TMD masking?

Thanks.
BR.

Robert_Crovella · February 8, 2024, 8:53pm

In the general case (without specifying per-client percentages) the general mental model should be the same as if the requests were issued from the same process, both for memory usage and compute utilization. SMs are not partitioned in the general case.

If you specify per-client resource partitioning percentages, then yes, the SMs are allocated to specific clients for use by any kernels launched by those clients. In the typical allocation scheme, specifying e.g. 20% for a client means that the client’s kernels cannot use more than 20% of the SMs on the GPU, so this is a more careful definition than “SMs are allocated to specific clients”. Note the statement from the linked doc:

Setting the limit does not reserve dedicated resources for any MPS client context. It simply limits how much resources can be used by a client context. Kernels launched from different MPS client contexts may execute on the same SM, depending on load-balancing.

I don’t know and I’m fairly confident that information is neither published nor specified by NVIDIA.

Topic		Replies	Views
What is the best way to partition the SM of a GPU? CUDA Programming and Performance hw , cuda , kernel	2	1381	August 17, 2023
Question about CUDA MPS CUDA Programming and Performance	15	3226	August 22, 2022
Cocurrent execution with MPS CUDA Programming and Performance	5	706	November 11, 2020
How to Enforce Per-Client Memory and SM Limits in CUDA MPS? CUDA Programming and Performance cuda , kernel , inception	1	148	August 13, 2025
What happens under MPS oversubscription CUDA Programming and Performance	5	130	November 19, 2025
How to control the resource of each client in NVIDIA-MPS CUDA Programming and Performance cuda	3	1098	October 26, 2021
MPS: can pre-volta devices have multiple kernels execute on the same SM? CUDA Programming and Performance	0	447	May 24, 2019
How to use CUDA Green Context with MPS CUDA Programming and Performance cuda , kernel	1	589	December 20, 2024
Is it possible to allocate the SMs to kernel or kernelet CUDA Programming and Performance	3	590	July 30, 2018
Interference between MPS Client Process CUDA Programming and Performance	1	100	March 6, 2025

MPS resource management

Related topics