CUDA MPS Server blocks applications on run on a different GPUs on a multi GPU environment

bharath.sekarp · October 27, 2017, 3:22pm

CUDA MPS Server blocks applications from starting on different GPUs even though, the MPS Server is specified to run only on a particular GPU.

Here’s the scenario

The server has 7 P100 GPUs (0-6), and the CUDA MPS server is started on GPU 0. Another application (based on Theano and Kreas) is started in GPU 3, however this application never starts. When the CUDA MPS server is stopped, applications run properly on different GPUs. The applications are exposed to particular GPUs using the CUDA_VISIBLE_DEVICES variable.

Here’s the log that I find in the mps-control log

[2017-10-27 08:08:51.723 Control 24078] Starting new server 24232 for user 1002
[2017-10-27 08:08:53.168 Control 24078] Accepting connection…
[2017-10-27 08:08:53.169 Control 24078] NEW SERVER 24232: Ignoring connection from user
[2017-10-27 08:08:53.919 Control 24078] Server 24232 exited with status 0

Does any one has encountered any such issue.?

Any help on this is greatly appreciated

CUDA version in our environment is 8.0, v8.0.61

Regards
Bharath

Robert_Crovella · October 27, 2017, 6:58pm

have you specified CUDA_VISIBLE_DEVICES when starting the MPS server?
have you placed the necessary GPUs in exclusive process mode?

are the non-MPS-managed GPUs in default compute mode?

bharath.sekarp · October 27, 2017, 7:40pm

Yes, I specify the CUDA_VISIBLE_DEVICES and place the specific GPU in exclusive_process mode

Scenario 1:

When I run the MPS server as a root and place the specific GPU in exclusive mode.

An update to this issue, when the stand alone processes are run setting the CUDA_VISIBLE_DEVICES=0, the processes are run as the clients to the MPS server.

On specifying the CUDA_VISIBLE_DEVICES > 0, the processes error out as no GPU available.

Scenario 2:

Now, when I run the MPS server as a non root user and don’t set the GPU in exclusive mode (set in default mode),

On setting the following environment variable

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log

the processes that are run eventually connects to the MPS server

However, processes that are run on other GPUs (by setting appropriate CUDA_VISIBLE_DEVICES) are blocked and hangs

bharath.sekarp · November 3, 2017, 3:45pm

Any one has any suggestions / updates?

bharath.sekarp · November 22, 2017, 9:13pm

We upgraded the driver to the latest version (384.66) but still this issue exists.

Any help on this would be great

Regards
Bharath

Topic		Replies	Views
Is there a way to tell a cuda application to not communicate with MPS server? (multi gpu system) CUDA Programming and Performance	1	311	May 11, 2021
MPS is not working CUDA Programming and Performance	7	2902	July 13, 2022
MPS (Multi-Process Service) in two GPUs CUDA Programming and Performance	0	514	February 2, 2021
Does sticky CUDA error affect other host processes using the same GPU? CUDA Programming and Performance	7	540	October 8, 2022
MPS client failed to reserve virtual memory range at address (nil) CUDA Programming and Performance	2	869	January 11, 2020
CUDA MPS not allowing new jobs to start CUDA Setup and Installation	2	905	February 21, 2019
MPS (Multi-Process Service) in two GPUs General cuda , gpu	3	1023	October 12, 2021
MPS limit on different cards CUDA Programming and Performance	1	612	July 1, 2019
pre-volta MPS test failed with error: mapping of buffer object failed CUDA Programming and Performance	3	1159	June 13, 2019
Process not running with MPS CUDA Programming and Performance	0	268	January 11, 2024

CUDA MPS Server blocks applications on run on a different GPUs on a multi GPU environment

Scenario 1:

Scenario 2:

Related topics