Which type of control can we have in the shared processes when using Hyper-Q (MULTI-PROCESS SERVICE- MPS)?
Is it possible to control the level of parallelism of different processes when sharing a GPU? That is, in a simple way, can we define the maximum amount of threads/warps each “MPS client” or process can have.
Or, it is only controllable in the core of the application implementation…
The manual is not saying anything… https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf