CUDA MPS Not Working as Expected in Multi-GPU Environment

Vinil-Vadakkepurakkal · November 8, 2024, 2:15am

Hello everyone,

I’m currently facing an issue with CUDA MPS in a multi-GPU environment. MPS works as expected in a single-GPU setting, but in a multi-GPU environment, all submitted jobs seem to be routed to the first GPU, leaving the remaining GPUs idle while other jobs sit in the queue.

System and Configuration Details

I’m using Slurm 23.11.9. Below are my Slurm and configuration details:

Slurm configuration:

(base) vinil@slurmgpu-scheduler:~$ grep Gres /etc/slurm/slurm.conf
GresTypes=gpu,mps

(base) vinil@slurmgpu-scheduler:~$ grep Gres /etc/slurm/azure.conf
Nodename=slurmgpu-hpc-1 Feature=cloud STATE=CLOUD CPUs=96 ThreadsPerCore=1 RealMemory=875520 Gres=gpu:8,mps:800

Gres configuration:

(base) vinil@slurmgpu-scheduler:~$ cat /etc/slurm/gres.conf
Nodename=slurmgpu-hpc-1 Name=gpu Count=8 File=/dev/nvidia[0-7]
Nodename=slurmgpu-hpc-1 Name=mps Count=800 File=/dev/nvidia[0-7]

Job Script Details

Here’s the job script I’m using:

#!/bin/
#SBATCH --job-name=cuda_mps_job
#SBATCH --output=cuda_mps_output.%j
#SBATCH --error=cuda_mps_error.%j
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=3
#SBATCH --gres=mps:25
#SBATCH --time=01:00:00
#SBATCH --partition=hpc

export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps-$SLURM_JOB_ID
export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log-$SLURM_JOB_ID

mkdir -p $CUDA_MPS_PIPE_DIRECTORY
mkdir -p $CUDA_MPS_LOG_DIRECTORY

if ! pgrep -x “nvidia-cuda-mps-control” > /dev/null; then
echo “Starting MPS control daemon…”
nvidia-cuda-mps-control -d
fi

export CUDA_MPS_ACTIVE_THREAD_PERCENTAGE=25

source /shared/home/vinil/anaconda3/etc/profile.d/conda.sh
conda activate training_env
python distributed_training.py

echo “Stopping MPS control daemon…”
echo quit | nvidia-cuda-mps-control
rm -rf $CUDA_MPS_PIPE_DIRECTORY
rm -rf $CUDA_MPS_LOG_DIRECTORY

Issue Details

In my setup, I have configured 800 MPS shares, aiming for 100 MPS shares per GPU. Each job is configured to use 25 MPS shares, which should allow four jobs per GPU (32 jobs total on an 8-GPU node). However, when I submit jobs, only the first GPU is utilized, while the rest are idle, causing other jobs to remain in the queue.

What I’ve Tried

CUDA_VISIBLE_DEVICES setting following the NVIDIA MPS documentation.
Slurm OPT_MULTIPLE_SHARING_GRES_PJ: Attempted setting this flag in slurm.conf as suggested in the Slurm docs to allow jobs to share multiple GPUs, but no change.

Output from squeue shows only jobs assigned to the first GPU, with the remaining jobs queued due to priority/resource limits.

(base) vinil@slurmgpu-scheduler:~$ squeue
(base) vinil@slurmgpu-scheduler:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
68 hpc cuda_mps vinil CF 0:03 1 slurmgpu-hpc-1
65 hpc cuda_mps vinil CF 0:04 1 slurmgpu-hpc-1
66 hpc cuda_mps vinil CF 0:04 1 slurmgpu-hpc-1
67 hpc cuda_mps vinil CF 0:04 1 slurmgpu-hpc-1
96 hpc cuda_mps vinil PD 0:00 1 (Priority)
95 hpc cuda_mps vinil PD 0:00 1 (Priority)
94 hpc cuda_mps vinil PD 0:00 1 (Priority)
93 hpc cuda_mps vinil PD 0:00 1 (Priority)
92 hpc cuda_mps vinil PD 0:00 1 (Priority)
91 hpc cuda_mps vinil PD 0:00 1 (Priority)
90 hpc cuda_mps vinil PD 0:00 1 (Priority)
89 hpc cuda_mps vinil PD 0:00 1 (Priority)
88 hpc cuda_mps vinil PD 0:00 1 (Priority)
87 hpc cuda_mps vinil PD 0:00 1 (Priority)
86 hpc cuda_mps vinil PD 0:00 1 (Priority)
85 hpc cuda_mps vinil PD 0:00 1 (Priority)
84 hpc cuda_mps vinil PD 0:00 1 (Priority)
83 hpc cuda_mps vinil PD 0:00 1 (Priority)
82 hpc cuda_mps vinil PD 0:00 1 (Priority)
81 hpc cuda_mps vinil PD 0:00 1 (Priority)
80 hpc cuda_mps vinil PD 0:00 1 (Priority)
79 hpc cuda_mps vinil PD 0:00 1 (Priority)
78 hpc cuda_mps vinil PD 0:00 1 (Priority)
77 hpc cuda_mps vinil PD 0:00 1 (Priority)
76 hpc cuda_mps vinil PD 0:00 1 (Priority)
75 hpc cuda_mps vinil PD 0:00 1 (Priority)
74 hpc cuda_mps vinil PD 0:00 1 (Priority)
73 hpc cuda_mps vinil PD 0:00 1 (Priority)
72 hpc cuda_mps vinil PD 0:00 1 (Priority)
71 hpc cuda_mps vinil PD 0:00 1 (Priority)
70 hpc cuda_mps vinil PD 0:00 1 (Priority)
69 hpc cuda_mps vinil PD 0:00 1 (Resources)

nvidia-smi output confirms that only the first GPU is active:

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 19017 M+C python 8480MiB |
| 0 N/A N/A 19018 M+C python 8480MiB |
| 0 N/A N/A 19019 M+C python 8480MiB |
| 0 N/A N/A 19020 M+C python 8480MiB |
| 0 N/A N/A 19045 C nvidia-cuda-mps-server 30MiB |
| 0 N/A N/A 19049 C nvidia-cuda-mps-server 30MiB |
| 0 N/A N/A 19050 C nvidia-cuda-mps-server 30MiB |
| 0 N/A N/A 19051 C nvidia-cuda-mps-server 30MiB |
±----------------------------------------------------------------------------------------+

Request

Has anyone experienced similar issues or have insights on resolving this? Any help or suggestions would be much appreciated!

Robert_Crovella · November 8, 2024, 4:25pm

although MPS can be configured to use multiple GPUs, when multiple GPUs are visible to the MPS server/daemon, there is no automatic distribution system to route different jobs to different GPUs. The usage model here in this case is still the CUDA Multi-GPU usage model.

Vinil-Vadakkepurakkal · November 11, 2024, 10:26am

Are you saying that using slurm we cannot use MPS to run jobs in multi-gpu environment?

Robert_Crovella · November 11, 2024, 3:53pm

No, I didn’t say anything about slurm, per se. The key message is that MPS by itself does not do automatic work distribution. It will not in a round-robin fashion assign one single-GPU job to GPU 0 and the next single-GPU job to GPU 1, for example. When multiple GPUs are visible to the MPS server, this has ramifications that you will need to think through. You are in a multi-GPU environment, in that case.

Vinil-Vadakkepurakkal · November 12, 2024, 12:50am

Thanks. Has anyone used MPS on a Slurm multi-GPU cluster? I can’t find any references online; all the discussions seem to focus on single GPU setups. Any insights would be appreciated.

Topic		Replies	Views
Question about CUDA MPS CUDA Programming and Performance	15	2820	August 22, 2022
MPI running issue using NVIDIA MPS Service on Multi-GPU nodes CUDA Programming and Performance	4	2196	September 16, 2016
MPS is not working CUDA Programming and Performance	7	3124	July 13, 2022
MPS Server is working with a single node multi-GPU but not working with two nodes multi-GPU CUDA Programming and Performance	0	643	March 28, 2024
Slurm not working for MPS and TensorRT Movie Lens tutorial Container: HPC tensorrt , cuda , hpc	4	1913	October 12, 2021
[Multiple GPUs / Processes] CUDA Memory De/Allocation Slow CUDA Programming and Performance	25	9590	December 4, 2017
CUDA & openMP Problem with the SDK sample code CUDA Programming and Performance	11	14007	September 12, 2015
cudaMemcpy Hung CUDA Programming and Performance	21	4099	May 30, 2019
CUDA MPS Problem CUDA Programming and Performance cuda	7	1207	May 23, 2022
Problem with IPC CUDA Programming and Performance	10	3432	May 27, 2020