Can MPS control per gpu QOS if multiple GPUs are managed by MPS?


I’m researching if it is possible to control GPU QOS through MPS when I have multiple GPUs.

Say if a process is going to use both GPU 0 and 1, which are all manged by MPS, is it possible that I set the utilization limit on GPU0 to be 50% and GPU1 to be 20%?

I found CUDA_MPS_ACTIVE_THREAD_PERCENTAGE is the environment variable that could control per client QOS, but how does it work if the process runs over multiple GPUs?