Mutli Process Service crashes on setting up the `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE` when launching a huge number of processes (say around 40~48 )

I am just describing to reproduce the issue which I am facing. I have the following toy CUDA program. [The compiled executable here is toy]

/**************toy.cu*********************************/
#include <cuda.h>                                                                                                              
#include <stdlib.h>                                                                                                            
#include <stdio.h>
#include <assert.h>
#define BLOCK_SIZE 256

__global__
void do_something(float* d_array)
{
    int idx = blockIdx.x*blockDim.x + threadIdx.x;
    d_array[idx]*=100;
}
int main()
{
    long N= 1<<10;
    float *arr = (float*) malloc(N*sizeof(float));
    long i;
    for (i=1;i<=N;i++)
        arr[i-1]=i;
    
    float *d_array;
    int ret;
    
    ret = cudaMalloc(&d_array, N*sizeof(float));
    printf("Return value of cudaMalloc = %d\n", ret);
    ret = cudaMemcpy(d_array, arr, N*sizeof(float), cudaMemcpyHostToDevice);
    printf("Return value of cudaMemcpy = %d\n", ret);

    int num_blocks= (N+BLOCK_SIZE-1)/BLOCK_SIZE;
    do_something<<<num_blocks, BLOCK_SIZE>>>(d_array);

    ret = cudaMemcpy(arr, d_array, N*sizeof(float), cudaMemcpyDeviceToHost);
    printf("Return value of cudaMemcpy = %d\n", ret);

    int j;
    for(i=0;i<N;)
    {
        for(j=0;j<8;j++)
                printf("%.0f\t", arr[i++]);
        printf("\n");
    }
    cudaFree(d_array);
    return 0;
}

Using the following script, I can launch many instances of the said program simultaneously without any issue when MPS is not running.

#!/bin/bash
# Check if the number of loop iterations is provided
if [ "$#" -lt 1 ]; then
    echo "Usage: $0 <num_iterations>"
    exit 1
fi

# Access the number of loop iterations from the first command-line argument
num_iterations="$1"

# Loop using the provided number of iterations
for (( i = 1; i <= num_iterations; i++ )); do
    ./toy &
done

$ ./toy_launch.sh 40 >> /dev/null

The above script works fine without MPS.

I enable MPS with the following command:

sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d
$ ./toy_launch.sh 40 >> /dev/null

The script above works fine until I set the CUDA_MPS_ACTIVE_THREAD_PERCENTAGE environment variable.

I set the environment variable as follows:

$ export CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=2

Now the request to run the same script :

$ ./toy_launch.sh 40 >> /dev/null

It causes the MPS system to hang after processing just 18 requests or so.

The machine is unable to execute any more GPU programs. The nvidia-smi shows the nvidia-cuda-mps-server running. But trying to quit the daemon as :

$ sudo nvidia-cuda-mps-control
quit

It does not seem to have any effect—instead, the prompt hangs there. Manually killing the daemon using the kill command using the PID of the server stops MPS, and I can launch GPU programs.

But, the problem arises when I try restarting the MPS.

sudo CUDA_DEVICE_ORDER=PCI_BUS_ID CUDA_VISIBLE_DEVICES=0 nvidia-cuda-mps-control -d

And then trying to launch the CUDA program functions without using the GPU.

The value returned is :

...
1       2       3       4       5       6       7       8
9       10      11      12      13      14      15      16
17      18      19      20      21      22      23      24
25      26      27      28      29      30      31      32
33...

instead of,

...
100     200     300     400     500     600     700     800
900     1000    1100    1200    1300    1400    1500    1600
1700    1800    1900    2000    2100    2200    2300    2400
2500    2600    2700    2800    2900    3000    3100    3200
3300...

And the nvidia-smi does not report the nvidia-cuda-mps-server after the execution of the above program. [Note that during the execution of the program, the nvidia-smi just flashes the nvidia-cuda-mps-server for a very little time, and then it goes away. It seems that it is trying to start but is unable to.]