CUDA MPS: cudamemcpy Error 806 When Running Multiple MPS Servers

prasannas090801 · February 24, 2025, 11:23am

Machine Specs:

GPU: Nvidia Grace Hopper (GH100 with 97GB memory)

Driver Version: 560.35.03

CUDA Version: 12.6

Issue Description:

I am running a script that starts an MPS server using the environment variables CUDA_MPS_PIPE_DIRECTORY and CUDA_MPS_LOG_DIRECTORY in a /tmp/pipe/timestamp and tmp/log/timestamp format, ensuring each MPS server logs and pipes to a unique folder.

After launching the MPS server, my script starts 25 clients that use MPS.

The issue arises when I run the script twice, launching 2 MPS servers with 30 clients each (total: 2 MPS servers, 60 clients). At this point, I encounter the following error:

Cudamemcpy failed: The remote procedural call between the MPS server and the MPS client failed (error code 806)

I have verified that GPU memory is not exhausted. This error only occurs when running 2 scripts (2 MPS servers). Even if I reduce the number of clients per MPS server, the error persists.

Questions:

Is there any known limitation or conflict when running multiple MPS servers? (I know the limit is 48 per Gpu per MPS)
Could this error be caused by how CUDA manages multiple MPS instances?
Are there any debugging steps or configurations I should check?

Any insights would be greatly appreciated. Thanks in advance!

Robert_Crovella · February 25, 2025, 2:11pm

When using multiple MPS servers, one expectation I would have is that they don’t target the same GPUs. This should be fairly evident (I think) based on my read of the user guide. The relationship of a server to its GPU(s) is that of exclusive access.

Since you haven’t mentioned the number of GPUs involved in this or the mapping of MPS servers to GPUs in the multi-server case, I don’t know if this applies to your case or not.

Topic		Replies	Views
pre-volta MPS test failed with error: mapping of buffer object failed CUDA Programming and Performance	3	1180	June 13, 2019
Multi-Process Service : MPS client limit issue CUDA Programming and Performance	1	991	February 20, 2017
Configuring multiple Volta MPS servers for execution resource provisioning CUDA Programming and Performance	2	1132	December 6, 2018
MPS limit on different cards CUDA Programming and Performance	1	617	July 1, 2019
MPS client failed to reserve virtual memory range at address (nil) CUDA Programming and Performance	2	884	January 11, 2020
Issue with turn on CUDA MPS server multiuser-server mode CUDA Setup and Installation	0	16	April 6, 2025
MPS is not working CUDA Programming and Performance	7	3111	July 13, 2022
MPI running issue using NVIDIA MPS Service on Multi-GPU nodes CUDA Programming and Performance	4	2192	September 16, 2016
CUDA MPS Server blocks applications on run on a different GPUs on a multi GPU environment CUDA Programming and Performance	4	1168	November 22, 2017
Get a Segmentation fault in MPS CUDA Programming and Performance cuda	0	31	January 22, 2025

CUDA MPS: cudamemcpy Error 806 When Running Multiple MPS Servers

Related topics