Hyper-Q for sharing GPUs

Which type of control can we have in the shared processes when using Hyper-Q (MULTI-PROCESS SERVICE- MPS)?

Is it possible to control the level of parallelism of different processes when sharing a GPU? That is, in a simple way, can we define the maximum amount of threads/warps each “MPS client” or process can have.

Or, it is only controllable in the core of the application implementation…

The manual is not saying anything… https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf


No, MPS doesn’t give you any control over the behavior of individual clients.