Hi!
I’m currently using a Titan V GPU and have some questions about configuring the “execution resource provisioning” option.
I can successfully restrict a client program’s thread usage by setting the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE environment variable (described in section 4.2.5 of https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf), but from what I understand, it should be possible to set up a per-user MPS server using the nvidia-cuda-mps-control tool.
Specifically, what I’d like to do is set up two users on my system, and restrict each user to only a half of capacity of the GPU, regardless of the number of CUDA processes each user starts. However, I only seem to be able to start one “server” at a time.
Here’s what I’m trying:
First, I start MPS the normal way, running as root.
export CUDA_VISIBLE_DEVICES=0
nvidia-smi -i 0 -c EXCLUSIVE_PROCESS
nvidia-cuda-mps-control -d
Next (still as root), I run nvidia-cuda-mps-control and enter the following commands (the UIDs for the two users I created are 1000 and 1001):
start_server -uid 1000
get_server_list
As expected, entering get_server_list at this point prints out the PID of the newly created server process. So far, as expected. However, if I try to create a second server, nothing appears to happen:
start_server -uid 1001
get_server_list
After entering the above commands, get_server_list still only prints out a single PID–the server created for the first UID.
So, my question is this: Is it even possible to create two independent MPS servers for separate users? If not, is there some other way to set a per-user resource limit using execution resource provisioning?
Thank you for any help!