Nvidia-cuda-mps-server vs. multithreading

charles.g.leggett · April 15, 2021, 10:10pm

I have a job with a number of CUDA kernels that can process data in either single threaded or multi-threaded mode. When multi-threaded, each thread has its own CUDA stream, and all kernels and memory accesses are associated with that stream. There are no mutexes or locks in the code, so the thread are pretty much entirely independent of each other. The job doesn’t make particularly efficient use of the GPU, as the workloads are small.

Alternatively, I can run multiple concurrent single threaded jobs processing the same data using nvidia-cuda-mps-server.

Comparing the two, as the number of threads/processes increases, the scaling with the mps-server is MUCH, MUCH better. Due to the fact that the GPU is not terribly stressed by the job, I can run 10 concurrent processes with the mps-server in about the same time as 1. However using the MT implementation with 10 threads, the a job to process an equivalent amount of data takes about 6 times as long.

How is it that the nvidia-cuda-mps-server is so much better at time slicing the GPU than multiple independent CUDA streams concurrently?

charles.g.leggett · April 15, 2021, 11:46pm

I see the same behaviour on a RTX 2080 Super, and an A100.

Topic		Replies	Views
Mps not work like i think in multi thread CUDA Programming and Performance	3	334	March 26, 2024
Concurrency in MPS and multi-stream GPU-Accelerated Libraries	2	1748	October 12, 2021
Parallelization of kernels without MPS CUDA Programming and Performance	6	848	February 5, 2019
cuda kernels from different process can run concurrently? same performance with MPS on and off? CUDA Programming and Performance	9	2255	May 3, 2018
Question about CUDA MPS CUDA Programming and Performance	15	3098	August 22, 2022
MPS enabled process is slower than MPS disabled process. CUDA Programming and Performance	1	813	December 4, 2018
MULTI-PROCESS SERVICE(MPS) has no effect CUDA Programming and Performance	3	871	October 16, 2018
Is these processes are computed parallelly using MPS? General	3	754	November 22, 2019
Parallel computing by cpu thread and gpu kernel CUDA Programming and Performance	5	1353	November 21, 2014
MPS on Turing architecture (GeForce RTX 2080) for jobs from multiple users CUDA Programming and Performance	3	1377	September 6, 2019

Nvidia-cuda-mps-server vs. multithreading

Related topics