Interference between MPS Client Process

namch0101 · March 6, 2025, 9:50am

Hello, I have a question about interference between processes when using MPS. First, I did experiments of measuring interference by running a single process and then check the latency of DenseNet121. I increased the mps percent by 10 with each test 10, 20, …, 90. After that, I launched 2 proceesses, with mps percent changing (10, 90), (20, 80), … , (50, 50) then checked the duration of DenseNet121.
It turned out that when running with higher mps percent, it gets lower interference. On the contrary, when running with lower mps, it gets higher interference. I think it is something with process with lower mps getting lower memory resource, but i’m not sure. Can you please explain why this kind of phenomenon happens? The below is the part of result of my experiment.

Duration of model on single process with 10 mps percent : 0.0576s
Duration of model on single process with 90 mps percent : 0.0141s
Duration of model on process with 10 mps percent with 90 mps percent process running together: 0.0851s
Duration of model on process with 90 mps percent with 10 mps percent process running together: 0.0151s

Duration of model on single process with 20 mps percent : 0.0308s
Duration of model on single process with 80 mps percent : 0.0143s
Duration of model on process with 20 mps percent with 80 mps percent process running together : 0.0465s
Duration of model on process with 80 mps percent with 20 mps percent process running together : 0.0163s

while mps percent 10% and mps percent 20% gets inteference of 47% and 50%
mps percnet 80% and 90% only gets interference of 13% and 7%.

Thank you in advance!

Robert_Crovella · March 6, 2025, 2:59pm

First, MPS, even with resource percentage allocation, doesn’t guarantee that there will be no interference between clients. MPS percentage execution resource partitioning divides the SM resources (AFAIK) but not anything else. In particular, there is no particular division of memory bandwidth (other than what may be inherent from the GPU design and the SM partitioning).

What I see in your output is that the model running by itself, with 90% of the GPU:

Seems to run a bit faster than that model running on 90%, with another model running on 10%:

That doesn’t seem surprising at all. The model running by itself at 90% has no competition for memory bandwidth. OTOH, when another client is present, there is competition for memory bandwidth. This might mean the model running on 90% may take a bit longer.

Memory/bandwidth probably isn’t the only resource that I can think of that isn’t expressly partitioned, but it might be the most important one, when thinking about these things.

Beyond that, I probably won’t be able to explain the exact relationship between percentage partitioning and level of interference. Since we don’t have an exact definition (that I know of) to explain how SMs utilize the available memory bandwidth, we can’t draw conclusions from that. However, if we assume that a single SM may be able to generate more memory traffic than what you would predict from (available bandwidth)/(SM count) on the GPU in question, then it stands to reason that a process running on a “small” partition might be able to use a lot of memory traffic by itself, but is “more impacted” when there is competition. But that is just hand-waving. I can’t offer a precise explanation.

MIG, on the other hand, tries to do a better job of full GPU partitioning, so that clients running on separate MIG partitions are (mostly) performance-isolated from each other. Not all GPUs support MIG, however.

Topic		Replies	Views
Intereference between client on MPS CUDA Programming and Performance	0	72	October 25, 2024
MPS interference problem CUDA Programming and Performance	1	120	November 12, 2024
MPS vs no MPS: drastic increase in kernel latency CUDA Programming and Performance	3	319	June 19, 2025
Question about CUDA MPS CUDA Programming and Performance	15	3258	August 22, 2022
CUDA MPS and UVM CUDA Programming and Performance	1	121	July 9, 2025
Improving MPS performance using Volta MPS Execution Resource Provisioning CUDA Programming and Performance	5	1503	July 4, 2019
Cocurrent execution with MPS CUDA Programming and Performance	5	745	November 11, 2020
MULTI-PROCESS SERVICE(MPS) has no effect CUDA Programming and Performance	3	912	October 16, 2018
What is the best way to partition the SM of a GPU? CUDA Programming and Performance hw , cuda , kernel	2	1416	August 17, 2023
Unable to see effect of MPS GPU-Accelerated Libraries	0	108	September 19, 2024

Interference between MPS Client Process

Related topics