Hello,
I have a question about interference between clients of MPS(Multi-Process Service). I set MPS percentage 50 for each process. According to NVIDIA MPS document, clients should not disturb each other much as they are concentrated on a set of SMs. However, the latency of computation increases when running more than 1 client process.
For example, when I run just one client process with 50% mps percentage, the latency of single forward computation was 100ms. However, when I run 2 client processes each with 50% mps percentage, the latency of single forward computation increases to 110ms on client 1 and 140ms on clinet 2.
I think it is something to do with bandwidth, but I want to know the reason of it for sure.
Also, is there any way to calculate the increases of computation latency in advance?