Configuring multiple Volta MPS servers for execution resource provisioning


I’m currently using a Titan V GPU and have some questions about configuring the “execution resource provisioning” option.

I can successfully restrict a client program’s thread usage by setting the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE environment variable (described in section 4.2.5 of, but from what I understand, it should be possible to set up a per-user MPS server using the nvidia-cuda-mps-control tool.

Specifically, what I’d like to do is set up two users on my system, and restrict each user to only a half of capacity of the GPU, regardless of the number of CUDA processes each user starts. However, I only seem to be able to start one “server” at a time.

Here’s what I’m trying:

First, I start MPS the normal way, running as root.

nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
nvidia-cuda-mps-control -d

Next (still as root), I run nvidia-cuda-mps-control and enter the following commands (the UIDs for the two users I created are 1000 and 1001):

start_server -uid 1000

As expected, entering get_server_list at this point prints out the PID of the newly created server process. So far, as expected. However, if I try to create a second server, nothing appears to happen:

start_server -uid 1001

After entering the above commands, get_server_list still only prints out a single PID–the server created for the first UID.

So, my question is this: Is it even possible to create two independent MPS servers for separate users? If not, is there some other way to set a per-user resource limit using execution resource provisioning?

Thank you for any help!

A single GPU cannot be assigned to two separate MPS servers.
Also, MPS doesn’t prevent one user from allocating all of the GPU memory.
You may also want to read sections and 3.3.1 of the document you linked.

I appreciate the response. I had read the relevant sections, but I was confused by the fact that nvidia-cuda-mps-control has a “get_server_list” command. I thought that may imply that it multiple servers could somehow be running, and I didn’t realize that it could enumerate servers across multiple GPUs.