MPS on Turing architecture (GeForce RTX 2080) for jobs from multiple users

ajay_nayak · September 5, 2019, 3:10pm

Hi,

In MPS documentation section 5.1 [url]Multi-Process Service :: GPU Deployment and Management Documentation we see that to enable MPS, we set the GPU to EXCLUSIVE mode. On doing so, we see that kernels launched from the same user are launched on multiple SMs and run in parallel. When in default mode, they run in time slicing manner.

When we try launching kernels from different users, one of the users kernels is not launched until other users’ kernel is over. Here is where my understanding regarding MPS as seen from Turing documentation is shaky. As Turing is similar to Volta [url]https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf[/url], I believed the statement ‘mps_clients’ means jobs launched from different users! But that is not the case. The jobs are run in a serialized manner. When MPS is disabled, the jobs are submitted, but as answered in another nvidia discussion, they are time shared. [url]gpu - How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications? - Stack Overflow.

So in a sense, jobs from 2 different users cannot be run in parallel even if resources are available using MPS on Turing architecture. Multiple User jobs will be serialized?

Thanks in advance.

Robert_Crovella · September 5, 2019, 3:30pm

Yes, for concurrency, all jobs using MPS must be submitted from the same user.

Read section 2.3.1.1:

[url]GPU Deployment and Management Documentation

ajay_nayak · September 6, 2019, 3:54am

Thanks Robert, just to confirm, this is the current state of MPS irrespective of the architecture i.e. Pascal, Volta, Turing. MPS is there for a specific use case of MPI jobs to be run on GPU which is highly likely to be run from the same (linux)user.

Thanks.

Robert_Crovella · September 6, 2019, 4:14am

That’s the way I read section 2.3.1.1
I don’t see any qualifiers in there for different architectures.

Correct, the most important use-case and design target for MPS is MPI codes.

Topic		Replies	Views
cuda kernels from different process can run concurrently? same performance with MPS on and off? CUDA Programming and Performance	9	2275	May 3, 2018
MULTI-PROCESS SERVICE(MPS) has no effect CUDA Programming and Performance	3	882	October 16, 2018
Can multi user execute programs simutanousely if MPS is on CUDA Setup and Installation cuda	4	978	March 24, 2022
Question about GPU sharing of Multi-process service CUDA Programming and Performance	9	6949	April 30, 2018
Parallelization of kernels without MPS CUDA Programming and Performance	6	859	February 5, 2019
Question about CUDA MPS CUDA Programming and Performance	15	3159	August 22, 2022
Multiple tensorflow sessions running on the same GPU in parallel (with/without MPS) CUDA Programming and Performance	0	732	October 7, 2020
Cocurrent execution with MPS CUDA Programming and Performance	5	683	November 11, 2020
Understanding how threads are scheduled with and without MPS CUDA Programming and Performance	0	402	June 19, 2019
Concurrency in MPS and multi-stream GPU-Accelerated Libraries	2	1756	October 12, 2021

MPS on Turing architecture (GeForce RTX 2080) for jobs from multiple users

Related topics