Hi, according to the CUDA MPS R495 docs (October 2021), we can set the default active thread percentage using nvidia-cuda-mps-control
.
- What I did: Run
nvidia-cuda-mps-control
, and then doset_default_active_thread_percentage 50
. - What is expected: All future CUDA clients created use only 50% of the SMs being available. This can be verified by checking the attribute
cudaDevAttrMultiProcessorCount
in code. - What really happens:
cudaDevAttrMultiProcessorCount
shows all 100% of SMs being available.
System Configuration
- NVIDIA-SMI 470.63.01
- Driver Version: 470.63.01
- CUDA Version: 11.4
- Tesla V100-PCIE GPU
- Ubuntu 18.04.6 LTS
Exact steps to replicate:
Step 1: Start the CUDA MPS server.
sudo nvidia-smi -i 0 -c 1
sudo CUDA_VISIBLE_DEVICES="UUID" nvidia-cuda-mps-control -d
Step 2: Create a small C++ file that looks like this and compile using nvcc hw.cpp -o a.out
.
// hw.cpp
#include <assert.h>
#include <stdio.h>
#include <cuda_runtime.h>
using namespace std;
int main(){
cudaSetDevice(0);
struct cudaDeviceProp devProp;
cudaGetDeviceProperties(&devProp, 0);
printf("cudaDevAttrMultiProcessorCount: %d\n\n", devProp.multiProcessorCount);
return 0;
}
Step 3: Run C++ client application using ./a.out
. We get output:
cudaDevAttrMultiProcessorCount: 80
Which makes sense for a V100 GPU, which has a total of 80 SMs.
Step 4: Try running CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25 ./a.out
and you get an output of:
cudaDevAttrMultiProcessorCount: 20
25% of 80 == 20, so this makes sense.
Step 5: Run nvidia-cuda-mps-control
and then set_default_active_thread_percentage 25
. According to the documentation, this should make sure that every client uses only 20SMs (should be equivalent to doing Step 4)
Step 6: Having set set_default_active_thread_percentage 25
, run ./a.out
. We get the output:
cudaDevAttrMultiProcessorCount: 80
Which does not make sense. It should be 20.
Also, another question: what exactly is the difference between uniform and non-uniform partitioning? From the docs:
The provisioning limit can be set via a few different mechanisms for different effects. These mechanisms are categorized into two mechanisms: active thread percentage and programmatic interface. In particular, partitioning via active thread percentage are categorized into two strategies: uniform partitioning and non-uniform partitioning.
From what I understand using CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25
is uniform partitioning - once I start the process with this env. var set to 25%, I cannot change the %age allocated to this process. With non-uniform, it seems like you can edit the active %age after starting the process, from within the process itself - is “non uniform partitioning” and “programmatic partitioning” the same, then?