[CUDA][WSL2][MPS] CUDA OOM with >14 MPI ranks under WSL2; MPS not starting

18227229731 · October 6, 2025, 3:53am

Hello,

I am running an MPI-based CUDA application under WSL2 (Ubuntu) with the following setup:

Works fine with 14 MPI ranks.
With 15 ranks or more, CUDA reports out of memory (OOM) even though nvidia-smi shows usage far below 48GB.

With 20 ranks, if only half the ranks allocate memory, OOM still occurs.
When splitting into two groups:
- First 10 ranks allocate memory → GPU memory grows modestly (from ~5412MB to ~8387MB).
- Second 10 ranks then attempt allocation → immediate OOM.
Suggests the issue is driver/context overhead or a WSL2 limitation, not physical memory exhaustion.

Tried enabling MPS in WSL2:
- Started nvidia-cuda-mps-control -d → control socket appears.
- No nvidia-cuda-mps-server process ever starts.
- server.log is empty.
- nvidia-smi never shows “Compute MPS Active”.
Seems MPS cannot properly start under WSL2.

Why does OOM occur with >14 ranks despite unused GPU memory?
Is this a known CUDA/WSL2 limitation (per-context memory reservation, driver behavior, etc.)?
Is there a reliable way to allow more MPI ranks to share the GPU under WSL2 (e.g., via MPS)?
How can I confirm whether MPS is supported/running in WSL2?

Any insights would be very helpful. Thank you!

Topic		Replies	Views
MPI running issue using NVIDIA MPS Service on Multi-GPU nodes CUDA Programming and Performance	4	2264	September 16, 2016
CUDA+MPI Device CUDA Programming and Performance	0	1575	April 28, 2010
Sharing 1 GPU betwenn MPI tasks work fine with 4 mpi tasks but cudaMalloc "unknown error" wi CUDA Programming and Performance	4	5967	April 10, 2009
MPI + CUDA Problem CUDA Programming and Performance	0	6485	October 3, 2011
MPI (MPICH2) and GTX580 using MPI with CUDA CUDA Programming and Performance	5	1749	April 12, 2012
cuda+mpi CUDA Programming and Performance	1	1952	May 4, 2010
CUDA+MPI = Unexplained Issues... Random Crashes, Errenous Output?!? CUDA Programming and Performance	5	3298	July 7, 2008
cuda+mpi CUDA Programming and Performance	3	2420	May 9, 2010
no cuda capable device when starting with mpiexec CUDA Programming and Performance	2	682	May 23, 2011
Fail to launch CUDA-MPS CUDA Programming and Performance	9	8636	October 26, 2015