Cocurrent execution with MPS

There are some questions as I read the manual.
1- It seems that the variable to control the number of clients is CUDA_MPS_ACTIVE_THREAD_PERCENTAGE. So, if I have two MPI processes, I have to set that variable to 50. Is that correct?

2- I read your answers here and here. It seems that MPS works with multiple processes offloaded on GPU, e.g two processes each has one kernel. The question is, what about one process with two kernels? For example, a machine learning program has one python process with multiple kernels running on GPU. Is MPS beneficial in this case?