MPS interference problem

Hello,
I have a question about interference between clients of MPS(Multi-Process Service). I set MPS percentage 50 for each process. According to NVIDIA MPS document, clients should not disturb each other much as they are concentrated on a set of SMs. However, the latency of computation increases when running more than 1 client process.

For example, when I run just one client process with 50% mps percentage, the latency of single forward computation was 100ms. However, when I run 2 client processes each with 50% mps percentage, the latency of single forward computation increases to 110ms on client 1 and 140ms on clinet 2.

I think it is something to do with bandwidth, but I want to know the reason of it for sure.
Also, is there any way to calculate the increases of computation latency in advance?

One possible source would be contention for memory bandwidth. There might be other possibilities, such as host<->device bandwidth or perhaps other shared resources.

You can use a profiler to help discover how the applications are behaving in each case, and which subsystems are being used. With no description or measurement of your application(s), its not possible to provide the “reason of it for sure”.

I don’t know of a method to predict computation latency with no measurements of the application. Others may have ideas.