Deploying Pytorch on MPS in multi-GPU machines

sharonxu65 · March 13, 2020, 5:51am

Hello

I’m trying to start a PyTorch training session on top of of multi-GPU machines with MPS. Previous I was able to deploy MPS on a machine with one GPU. I used the same process on a multi GPU machine and I’m getting an output that looks like:

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 84541 C nvidia-cuda-mps-server 25MiB |
| 0 88194 C python3 2549MiB |
| 1 84541 C nvidia-cuda-mps-server 25MiB |
| 1 88194 C python3 2463MiB |
| 2 84541 C nvidia-cuda-mps-server 25MiB |
| 2 88194 C python3 2449MiB |
| 3 84541 C nvidia-cuda-mps-server 25MiB |
| 3 88194 C python3 2457MiB |
±----------------------------------------------------------------------------+

Switching to one device also didn’t work as the output looks like this

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 84541 C nvidia-cuda-mps-server 25MiB |
| 0 91698 C python3 865MiB |
| 1 84541 C nvidia-cuda-mps-server 25MiB |
| 2 84541 C nvidia-cuda-mps-server 25MiB |
| 3 84541 C nvidia-cuda-mps-server 25MiB |
±----------------------------------------------------------------------------+

It looks like my process can never bind to the server – The server log is also completely empty.

Is there any way to debug this?

Thanks

Topic		Replies	Views
MPS Server is working with a single node multi-GPU but not working with two nodes multi-GPU CUDA Programming and Performance	0	644	March 28, 2024
MPI running issue using NVIDIA MPS Service on Multi-GPU nodes CUDA Programming and Performance	4	2198	September 16, 2016
MPS with multiGPUs Triton Inference Server - archived	0	960	May 9, 2020
MPS is not working CUDA Programming and Performance	7	3132	July 13, 2022
GPU memory cannot be released Deep Learning (Training & Inference)	0	1342	October 26, 2018
L40S - Multi GPU doesn't work CUDA Programming and Performance cuda , pytorch	3	207	February 12, 2025
GPU1 show C instead of M+C CUDA Programming and Performance	1	315	January 25, 2024
Mutli Process Service crashes on setting up the `CUDA_MPS_ACTIVE_THREAD_PERCENTAGE` when launching a huge number of processes (say around 40~48 ) CUDA Programming and Performance cuda , kernel , gpu , gpu-computing	0	720	August 11, 2023
PyTorch utilize CPU instead of GPU CUDA on Windows Subsystem for Linux	5	2840	November 25, 2020
Low performance on P6000 with AMD 1920x CUDA Programming and Performance	5	830	September 12, 2023

Deploying Pytorch on MPS in multi-GPU machines

Related topics