MPS is not working

kimwonhoi93 · June 18, 2021, 2:22am

I’m using MPS to run multiple programs on one GPU. But the MPS doesn’t seem to be working normally.
I used nvidia docker and set up mps in docker by following the terms in the following document.

“5.1.1.1. Starting MPS control daemon”

Performance speed using MPS is not better than before using MPS. So I read ‘control.log’ and ‘server.log’, which is different from “4.3. MPS Logging Format”.

I will upload ‘control.log’ and ‘server.log’ contents, so please check if I set MPS correctly. I am using the RTX 3080 10G on ubuntu 18.04.

control.log

[2021-06-18 01:03:31.887 Control 31] Start
[2021-06-18 01:05:36.843 Control 31] Accepting connection…
[2021-06-18 01:05:36.843 Control 31] User did not send valid credentials
[2021-06-18 01:05:36.843 Control 31] Accepting connection…
[2021-06-18 01:05:36.843 Control 31] NEW CLIENT 52 from user 0: Server is not ready, push client to pending list
[2021-06-18 01:05:36.843 Control 31] Starting new server 217 for user 0
[2021-06-18 01:05:36.844 Control 31] Accepting connection…
[2021-06-18 01:05:36.891 Control 31] NEW SERVER 217: Ready
[2021-06-18 01:05:36.892 Control 31] Accepting connection…
[2021-06-18 01:05:36.892 Control 31] User did not send valid credentials
[2021-06-18 01:05:36.892 Control 31] Accepting connection…
[2021-06-18 01:05:36.892 Control 31] User did not send valid credentials
[2021-06-18 01:05:36.892 Control 31] Accepting connection…
[2021-06-18 01:05:36.892 Control 31] NEW CLIENT 58 from user 0: Server already exists
[2021-06-18 01:05:36.892 Control 31] Accepting connection…
[2021-06-18 01:05:36.892 Control 31] NEW CLIENT 46 from user 0: Server already exists
[2021-06-18 01:05:36.937 Control 31] Accepting connection…
[2021-06-18 01:05:36.938 Control 31] NEW CLIENT 52 from user 0: Server already exists
[2021-06-18 01:05:36.938 Control 31] Accepting connection…
[2021-06-18 01:05:36.938 Control 31] NEW CLIENT 58 from user 0: Server already exists
[2021-06-18 01:05:36.938 Control 31] Accepting connection…
[2021-06-18 01:05:36.938 Control 31] NEW CLIENT 46 from user 0: Server already exists
[2021-06-18 01:42:41.225 Control 31] Accepting connection…
[2021-06-18 01:42:41.225 Control 31] NEW UI
[2021-06-18 01:42:41.225 Control 31] Cmd:quit
[2021-06-18 01:42:41.309 Control 31] Server 217 exited with status 0
[2021-06-18 01:42:41.309 Control 31] Removed Shm file at /cuda.shm.0.d9.1
[2021-06-18 01:42:41.309 Control 31] Removed Shm file at /cuda.shm.0.d9.2
[2021-06-18 01:42:41.309 Control 31] Exit with status 0

server.log

[2021-06-18 01:05:36.844 Other 217] Start
[2021-06-18 01:05:36.857 Other 217] Volta MPS: Creating server context on device 0
[2021-06-18 01:05:36.891 Other 217] activeThreadsPercentage set to 100.000000
[2021-06-18 01:05:36.891 Other 217] MPS Server is started
[2021-06-18 01:05:36.891 Other 217] Volta MPS Server: Received new client request
[2021-06-18 01:05:36.891 Other 217] MPS Server: worker created
[2021-06-18 01:05:36.892 Other 217] Volta MPS: Creating worker thread
[2021-06-18 01:05:36.892 Other 217] Volta MPS Server: Received new client request
[2021-06-18 01:05:36.892 Other 217] MPS Server: worker created
[2021-06-18 01:05:36.892 Other 217] Volta MPS: Creating worker thread
[2021-06-18 01:05:36.892 Other 217] Volta MPS Server: Received new client request
[2021-06-18 01:05:36.892 Other 217] MPS Server: worker created
[2021-06-18 01:05:36.892 Other 217] Volta MPS: Creating worker thread
[2021-06-18 01:05:36.938 Other 217] Volta MPS Server: Received new client request
[2021-06-18 01:05:36.938 Other 217] MPS Server: worker created
[2021-06-18 01:05:36.938 Other 217] Volta MPS: Creating worker thread
[2021-06-18 01:05:36.938 Other 217] Volta MPS: Device GeForce RTX 3080 (uuid 0x1fabc583-0xa80f0a19-0x8b8cbeca-0xc214ca5a) is associated
[2021-06-18 01:05:36.938 Other 217] Volta MPS Server: Received new client request
[2021-06-18 01:05:36.938 Other 217] MPS Server: worker created
[2021-06-18 01:05:36.938 Other 217] Volta MPS: Creating worker thread
[2021-06-18 01:05:36.938 Other 217] Volta MPS: Device GeForce RTX 3080 (uuid 0x1fabc583-0xa80f0a19-0x8b8cbeca-0xc214ca5a) is associated
[2021-06-18 01:05:36.938 Other 217] Volta MPS Server: Received new client request
[2021-06-18 01:05:36.938 Other 217] MPS Server: worker created
[2021-06-18 01:05:36.938 Other 217] Volta MPS: Creating worker thread
[2021-06-18 01:05:36.938 Other 217] Volta MPS: Device GeForce RTX 3080 (uuid 0x1fabc583-0xa80f0a19-0x8b8cbeca-0xc214ca5a) is associated
[2021-06-18 01:42:23.603 Other 217] Receive command failed, assuming client exit
[2021-06-18 01:42:23.603 Other 217] Volta MPS: Client disconnected. Number of active client contexts is 2
[2021-06-18 01:42:23.611 Other 217] Receive command failed, assuming client exit
[2021-06-18 01:42:23.611 Other 217] Volta MPS: Client process disconnected
[2021-06-18 01:42:23.629 Other 217] Receive command failed, assuming client exit
[2021-06-18 01:42:23.629 Other 217] Volta MPS: Client disconnected. Number of active client contexts is 1
[2021-06-18 01:42:23.629 Other 217] Receive command failed, assuming client exit
[2021-06-18 01:42:23.629 Other 217] Volta MPS: Client disconnected. Number of active client contexts is 0
[2021-06-18 01:42:23.693 Other 217] Receive command failed, assuming client exit
[2021-06-18 01:42:23.693 Other 217] Receive command failed, assuming client exit
[2021-06-18 01:42:23.693 Other 217] Volta MPS: Client process disconnected
[2021-06-18 01:42:23.693 Other 217] Volta MPS: Client process disconnected
[2021-06-18 01:42:41.225 Other 217] Waiting for current clients to finish
[2021-06-18 01:42:41.225 Other 217] Exit

Robert_Crovella · June 18, 2021, 2:26pm

The logs look correct to me.

kimwonhoi93 · June 19, 2021, 2:51am

Thank you for replying to this Topic!!

If this log is correct and MPS is working normally, the working speed of multi-programs should be faster than before using MPS. However, the speed of my multi-program work was the same as before.

Do I have to do anything else within the program besides setting up the MPS server?
My program is an object detection program using tensorflow savedModel. I use c++.

Is there any other way to work like MPS?
MPS is a must-have feature for me… Please let me know how to solve it. I want to speed up my multi-programming.

Robert_Crovella · June 19, 2021, 1:30pm

I don’t know anywhere in any NVIDIA documentation where that is written. MPS doesn’t automatically speed things up. MPS may give faster performance in some cases. Not in all cases.

njuffa · June 20, 2021, 1:40am

In my understanding, the main purpose of MPS is to increase GPU utilization when multiple processes share a GPU, e.g. multiple MPI ranks mapped to one GPU.

When high GPU utilization is already achieved without the use of MPS, it is entirely possible, and in fact likely, that overall performance does not increase when using MPS. It is even conceivable (although I have yet to personally observed such a case) that performance could decrease slightly with MPS due to the overhead of orchestrating access to a shared resource among multiple users (many-to-one context mapping).

chenmzh · March 20, 2022, 7:37am

Hi, Robert!
when I execute the command “nvidia-cuda-mps-control –d”，but cannot find MPS control daemon process. What is the problem and how can it be solved?
Many thanks in advance!

Robert_Crovella · March 20, 2022, 11:04pm

I don’t really know what that means. Do you mean that you get a message from the linux OS that it cannot find the file nvidia-cuda-mps-control ? perhaps it would be better if you showed your console session that demonstrates the issue. (copy/paste the text)

sungin.h · July 13, 2022, 7:18am

In case you want to find an nvidia-cuda-mps-server rather than nvidia-cuda-mps-control, try to run any CUDA program and check again.