MPS set_default_active_thread_percentage not working as expected

Hi, according to the CUDA MPS R495 docs (October 2021), we can set the default active thread percentage using nvidia-cuda-mps-control.

  • What I did: Run nvidia-cuda-mps-control, and then do set_default_active_thread_percentage 50.
  • What is expected: All future CUDA clients created use only 50% of the SMs being available. This can be verified by checking the attribute cudaDevAttrMultiProcessorCount in code.
  • What really happens: cudaDevAttrMultiProcessorCount shows all 100% of SMs being available.

System Configuration

  • NVIDIA-SMI 470.63.01
  • Driver Version: 470.63.01
  • CUDA Version: 11.4
  • Tesla V100-PCIE GPU
  • Ubuntu 18.04.6 LTS

Exact steps to replicate:

Step 1: Start the CUDA MPS server.

sudo nvidia-smi -i 0 -c 1
sudo CUDA_VISIBLE_DEVICES="UUID" nvidia-cuda-mps-control -d

Step 2: Create a small C++ file that looks like this and compile using nvcc hw.cpp -o a.out.

// hw.cpp
#include <assert.h>
#include <stdio.h>
#include <cuda_runtime.h>
using namespace std;

int main(){
    cudaSetDevice(0);
    struct cudaDeviceProp devProp;
    cudaGetDeviceProperties(&devProp, 0);
    printf("cudaDevAttrMultiProcessorCount: %d\n\n", devProp.multiProcessorCount);
    return 0;
}

Step 3: Run C++ client application using ./a.out. We get output:

cudaDevAttrMultiProcessorCount: 80

Which makes sense for a V100 GPU, which has a total of 80 SMs.

Step 4: Try running CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25 ./a.out and you get an output of:

cudaDevAttrMultiProcessorCount: 20

25% of 80 == 20, so this makes sense.

Step 5: Run nvidia-cuda-mps-control and then set_default_active_thread_percentage 25. According to the documentation, this should make sure that every client uses only 20SMs (should be equivalent to doing Step 4)

Step 6: Having set set_default_active_thread_percentage 25, run ./a.out. We get the output:

cudaDevAttrMultiProcessorCount: 80

Which does not make sense. It should be 20.

Also, another question: what exactly is the difference between uniform and non-uniform partitioning? From the docs:

The provisioning limit can be set via a few different mechanisms for different effects. These mechanisms are categorized into two mechanisms: active thread percentage and programmatic interface. In particular, partitioning via active thread percentage are categorized into two strategies: uniform partitioning and non-uniform partitioning.

From what I understand using CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25 is uniform partitioning - once I start the process with this env. var set to 25%, I cannot change the %age allocated to this process. With non-uniform, it seems like you can edit the active %age after starting the process, from within the process itself - is “non uniform partitioning” and “programmatic partitioning” the same, then?

It looks like you may be doing things in the wrong order. Note the documentation:

set_default_active_thread_percentage - this overrides the default active thread percentage for MPS servers. If there is already a server spawned, this command will only affect the next server.

(emphasis added)

1 Like

I see, thanks! Doing step 5 immediately after Step 1 worked.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.