MPS set_default_active_thread_percentage not working as expected

rohitdwivedula · November 9, 2021, 8:07pm

Hi, according to the CUDA MPS R495 docs (October 2021), we can set the default active thread percentage using nvidia-cuda-mps-control.

What I did: Run nvidia-cuda-mps-control, and then do set_default_active_thread_percentage 50.
What is expected: All future CUDA clients created use only 50% of the SMs being available. This can be verified by checking the attribute cudaDevAttrMultiProcessorCount in code.
What really happens: cudaDevAttrMultiProcessorCount shows all 100% of SMs being available.

System Configuration

NVIDIA-SMI 470.63.01
Driver Version: 470.63.01
CUDA Version: 11.4
Tesla V100-PCIE GPU
Ubuntu 18.04.6 LTS

Exact steps to replicate:

Step 1: Start the CUDA MPS server.

sudo nvidia-smi -i 0 -c 1
sudo CUDA_VISIBLE_DEVICES="UUID" nvidia-cuda-mps-control -d

Step 2: Create a small C++ file that looks like this and compile using nvcc hw.cpp -o a.out.

// hw.cpp
#include <assert.h>
#include <stdio.h>
#include <cuda_runtime.h>
using namespace std;

int main(){
    cudaSetDevice(0);
    struct cudaDeviceProp devProp;
    cudaGetDeviceProperties(&devProp, 0);
    printf("cudaDevAttrMultiProcessorCount: %d\n\n", devProp.multiProcessorCount);
    return 0;
}

Step 3: Run C++ client application using ./a.out. We get output:

cudaDevAttrMultiProcessorCount: 80

Which makes sense for a V100 GPU, which has a total of 80 SMs.

Step 4: Try running CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25 ./a.out and you get an output of:

cudaDevAttrMultiProcessorCount: 20

25% of 80 == 20, so this makes sense.

Step 5: Run nvidia-cuda-mps-control and then set_default_active_thread_percentage 25. According to the documentation, this should make sure that every client uses only 20SMs (should be equivalent to doing Step 4)

Step 6: Having set set_default_active_thread_percentage 25, run ./a.out. We get the output:

cudaDevAttrMultiProcessorCount: 80

Which does not make sense. It should be 20.

Also, another question: what exactly is the difference between uniform and non-uniform partitioning? From the docs:

The provisioning limit can be set via a few different mechanisms for different effects. These mechanisms are categorized into two mechanisms: active thread percentage and programmatic interface. In particular, partitioning via active thread percentage are categorized into two strategies: uniform partitioning and non-uniform partitioning.

From what I understand using CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25 is uniform partitioning - once I start the process with this env. var set to 25%, I cannot change the %age allocated to this process. With non-uniform, it seems like you can edit the active %age after starting the process, from within the process itself - is “non uniform partitioning” and “programmatic partitioning” the same, then?

Robert_Crovella · November 9, 2021, 8:31pm

It looks like you may be doing things in the wrong order. Note the documentation:

set_default_active_thread_percentage - this overrides the default active thread percentage for MPS servers. If there is already a server spawned, this command will only affect the next server.

(emphasis added)

rohitdwivedula · November 9, 2021, 8:44pm

I see, thanks! Doing step 5 immediately after Step 1 worked.

system · November 23, 2021, 8:45pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Misunderstand about MPS non-uniform partitioning CUDA Programming and Performance cuda	0	17	March 28, 2025
MPS thread limit and 100% GPU usage CUDA Programming and Performance	6	28	June 10, 2025
Mutli Process Service crashes on setting up the `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE` when launching a huge number of processes (say around 40~48 ) CUDA Programming and Performance cuda , kernel , gpu , gpu-computing	0	723	August 11, 2023
How to tune the SM utilization (across the entire GPU) of a CUDA kernel? CUDA Programming and Performance cuda , kernel , ubuntu	4	1051	July 23, 2023
Multi-Process Service setting CUDA_MPS_ACTIVE_THREAD_PERCENTAGE variable while application is running DGX User Forum	1	624	May 8, 2025
Can I dynamically change CUDA_MPS_ACTIVE_THREAD_PERCENTAGE to a running MPS process? CUDA Programming and Performance	2	508	May 8, 2025
General question on MPS set_active_thread_percentage CUDA Developer Tools	0	587	December 14, 2020
Set_default_active_thread_percentage mps server limits memory too CUDA Programming and Performance	1	426	February 15, 2023
Question about NVVP results: GPU's SMs and cores during CUDA kernel execution CUDA Programming and Performance	1	1213	April 13, 2019
Improving MPS performance using Volta MPS Execution Resource Provisioning CUDA Programming and Performance	5	1349	July 4, 2019

MPS set_default_active_thread_percentage not working as expected

System Configuration

Exact steps to replicate:

Related topics