MPS enabled process is slower than MPS disabled process.

snehasubramanian1993 · December 4, 2018, 6:04am

System Config:
NVIDIA TITAN XP. Cuda compilation tools, release 9.0, V9.0.176

I ran two models (RNN and CNN) as two processes on a single GPU core in 1) default mode of computability (thread parallelization) and 2) with enabling multi-process service (MPS).

shows lower run-time compared to 2). My understanding was that MPS enabled Kernel level parallelism and hence I’d expect 2) to be faster than 1). Can someone please let know if I am missing something and why I observe 1) to be faster than 2). Am I missing some additional configuration that need enabling with these modes?

njuffa · December 4, 2018, 7:11am

Not my area of expertise. Robert Crovella probably can provide better insights. Two thoughts to ponder:

(1) If each process can keep the GPU busy by itself (seem likely for a deep-learning application), kernel parallelism cannot increase overall throughput

(2) But MPS adds overhead (multiple processes share a resource, namely a GPU context; this requires coordination)

Topic		Replies	Views
Parallelization of kernels without MPS CUDA Programming and Performance	6	741	February 5, 2019
Feature Request: Dynamic parallelism support under MPS CUDA Programming and Performance	2	668	June 8, 2018
Why is MPS not default? CUDA Programming and Performance	1	552	September 7, 2018
cuda kernels from different process can run concurrently? same performance with MPS on and off? CUDA Programming and Performance	9	2056	May 3, 2018
Fine grained Kernel scheduling with MPS CUDA Programming and Performance tensorflow , kernel , ubuntu , python , linux	10	1393	January 11, 2025
Concurrency in MPS and multi-stream GPU-Accelerated Libraries	2	1631	October 12, 2021
Nvidia-cuda-mps-server vs. multithreading CUDA Programming and Performance	1	637	April 15, 2021
Why CUDA MPS make cudaMalloc faster? CUDA Programming and Performance	1	533	November 3, 2019
MPS (Multi-Process Service) in two GPUs CUDA Programming and Performance	0	518	February 2, 2021
concurrent execution of cuda kernels from different contexts CUDA Programming and Performance	1	618	April 18, 2019

MPS enabled process is slower than MPS disabled process.

Related topics