MPS on Turing architecture (GeForce RTX 2080) for jobs from multiple users

Hi,

In MPS documentation section 5.1 https://docs.nvidia.com/deploy/mps/index.html we see that to enable MPS, we set the GPU to EXCLUSIVE mode. On doing so, we see that kernels launched from the same user are launched on multiple SMs and run in parallel. When in default mode, they run in time slicing manner.

When we try launching kernels from different users, one of the users kernels is not launched until other users’ kernel is over. Here is where my understanding regarding MPS as seen from Turing documentation is shaky. As Turing is similar to Volta https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf, I believed the statement ‘mps_clients’ means jobs launched from different users! But that is not the case. The jobs are run in a serialized manner. When MPS is disabled, the jobs are submitted, but as answered in another nvidia discussion, they are time shared. https://stackoverflow.com/questions/34709749/how-do-i-use-nvidia-multi-process-service-mps-to-run-multiple-non-mpi-cuda-app.

So in a sense, jobs from 2 different users cannot be run in parallel even if resources are available using MPS on Turing architecture. Multiple User jobs will be serialized?

Thanks in advance.

Yes, for concurrency, all jobs using MPS must be submitted from the same user.

Read section 2.3.1.1:

https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf

Thanks Robert, just to confirm, this is the current state of MPS irrespective of the architecture i.e. Pascal, Volta, Turing. MPS is there for a specific use case of MPI jobs to be run on GPU which is highly likely to be run from the same (linux)user.

Thanks.

That’s the way I read section 2.3.1.1
I don’t see any qualifiers in there for different architectures.

Correct, the most important use-case and design target for MPS is MPI codes.