CUDA MPS: cudamemcpy Error 806 When Running Multiple MPS Servers

Machine Specs:

GPU: Nvidia Grace Hopper (GH100 with 97GB memory)

Driver Version: 560.35.03

CUDA Version: 12.6

Issue Description:

I am running a script that starts an MPS server using the environment variables CUDA_MPS_PIPE_DIRECTORY and CUDA_MPS_LOG_DIRECTORY in a /tmp/pipe/timestamp and tmp/log/timestamp format, ensuring each MPS server logs and pipes to a unique folder.

After launching the MPS server, my script starts 25 clients that use MPS.

The issue arises when I run the script twice, launching 2 MPS servers with 30 clients each (total: 2 MPS servers, 60 clients). At this point, I encounter the following error:

Cudamemcpy failed: The remote procedural call between the MPS server and the MPS client failed (error code 806)

I have verified that GPU memory is not exhausted. This error only occurs when running 2 scripts (2 MPS servers). Even if I reduce the number of clients per MPS server, the error persists.

Questions:

  1. Is there any known limitation or conflict when running multiple MPS servers? (I know the limit is 48 per Gpu per MPS)

  2. Could this error be caused by how CUDA manages multiple MPS instances?

  3. Are there any debugging steps or configurations I should check?

Any insights would be greatly appreciated. Thanks in advance!

When using multiple MPS servers, one expectation I would have is that they don’t target the same GPUs. This should be fairly evident (I think) based on my read of the user guide. The relationship of a server to its GPU(s) is that of exclusive access.

Since you haven’t mentioned the number of GPUs involved in this or the mapping of MPS servers to GPUs in the multi-server case, I don’t know if this applies to your case or not.