Mutli Process Service crashes on setting up the `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE` when launching a huge number of processes (say around 40~48 )

abhishekghos · August 11, 2023, 9:26am

I am just describing to reproduce the issue which I am facing. I have the following toy CUDA program. [The compiled executable here is toy]

/**************toy.cu*********************************/
#include <cuda.h>                                                                                                              
#include <stdlib.h>                                                                                                            
#include <stdio.h>
#include <assert.h>
#define BLOCK_SIZE 256

__global__
void do_something(float* d_array)
{
    int idx = blockIdx.x*blockDim.x + threadIdx.x;
    d_array[idx]*=100;
}
int main()
{
    long N= 1<<10;
    float *arr = (float*) malloc(N*sizeof(float));
    long i;
    for (i=1;i<=N;i++)
        arr[i-1]=i;
    
    float *d_array;
    int ret;
    
    ret = cudaMalloc(&d_array, N*sizeof(float));
    printf("Return value of cudaMalloc = %d\n", ret);
    ret = cudaMemcpy(d_array, arr, N*sizeof(float), cudaMemcpyHostToDevice);
    printf("Return value of cudaMemcpy = %d\n", ret);

    int num_blocks= (N+BLOCK_SIZE-1)/BLOCK_SIZE;
    do_something<<<num_blocks, BLOCK_SIZE>>>(d_array);

    ret = cudaMemcpy(arr, d_array, N*sizeof(float), cudaMemcpyDeviceToHost);
    printf("Return value of cudaMemcpy = %d\n", ret);

    int j;
    for(i=0;i<N;)
    {
        for(j=0;j<8;j++)
                printf("%.0f\t", arr[i++]);
        printf("\n");
    }
    cudaFree(d_array);
    return 0;
}

Using the following script, I can launch many instances of the said program simultaneously without any issue when MPS is not running.

#!/bin/bash
# Check if the number of loop iterations is provided
if [ "$#" -lt 1 ]; then
    echo "Usage: $0 <num_iterations>"
    exit 1
fi

# Access the number of loop iterations from the first command-line argument
num_iterations="$1"

# Loop using the provided number of iterations
for (( i = 1; i <= num_iterations; i++ )); do
    ./toy &
done

$ ./toy_launch.sh 40 >> /dev/null

The above script works fine without MPS.

I enable MPS with the following command:

sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d

$ ./toy_launch.sh 40 >> /dev/null

The script above works fine until I set the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE environment variable.

I set the environment variable as follows:

$ export CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=2

Now the request to run the same script :

$ ./toy_launch.sh 40 >> /dev/null

It causes the MPS system to hang after processing just 18 requests or so.

The machine is unable to execute any more GPU programs. The nvidia-smi shows the nvidia-cuda-mps-server running. But trying to quit the daemon as :

$ sudo nvidia-cuda-mps-control
quit

It does not seem to have any effect—instead, the prompt hangs there. Manually killing the daemon using the kill command using the PID of the server stops MPS, and I can launch GPU programs.

But, the problem arises when I try restarting the MPS.

sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d

And then trying to launch the CUDA program functions without using the GPU.

The value returned is :

...
1       2       3       4       5       6       7       8
9       10      11      12      13      14      15      16
17      18      19      20      21      22      23      24
25      26      27      28      29      30      31      32
33...

instead of,

...
100     200     300     400     500     600     700     800
900     1000    1100    1200    1300    1400    1500    1600
1700    1800    1900    2000    2100    2200    2300    2400
2500    2600    2700    2800    2900    3000    3100    3200
3300...

And the nvidia-smi does not report the nvidia-cuda-mps-server after the execution of the above program. [Note that during the execution of the program, the nvidia-smi just flashes the nvidia-cuda-mps-server for a very little time, and then it goes away. It seems that it is trying to start but is unable to.]

Topic		Replies	Views
Multi-Process Service Active Thread Percentage CUDA Programming and Performance	0	517	May 5, 2022
MPS set_default_active_thread_percentage not working as expected CUDA Programming and Performance	3	2233	November 23, 2021
Improving MPS performance using Volta MPS Execution Resource Provisioning CUDA Programming and Performance	5	1477	July 4, 2019
Multi-Process Service setting CUDA_MPS_ACTIVE_THREAD_PERCENTAGE variable while application is running DGX Systems (Data Center)	1	712	May 8, 2025
MPS: Limiting threads to different thresholds for multi-GPU processes CUDA Programming and Performance tensorflow , kernel , ubuntu , python , linux	1	772	October 27, 2021
Process not running with MPS CUDA Programming and Performance	0	337	January 11, 2024
MPS thread limit and 100% GPU usage CUDA Programming and Performance	7	648	August 14, 2025
Question about CUDA MPS CUDA Programming and Performance	15	3226	August 22, 2022
MULTI-PROCESS SERVICE(MPS) has no effect CUDA Programming and Performance	3	902	October 16, 2018
Can I dynamically change CUDA_MPS_ACTIVE_THREAD_PERCENTAGE to a running MPS process? CUDA Programming and Performance	3	629	April 11, 2026

Mutli Process Service crashes on setting up the `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE` when launching a huge number of processes (say around 40~48 )

Related topics