Hello,
I am running an MPI-based CUDA application under WSL2 (Ubuntu) with the following setup:
-
GPU: NVIDIA RTX A6000 (48GB)
-
Driver (WSL): 535.247.01 (Driver Version: 531.14, CUDA 12.1)
-
Driver (Windows host): 531.4
-
CUDA Toolkit (WSL): 11.6 (nvcc release 11.6, V11.6.55)
-
CUDA Toolkit (Windows): 12.1 (nvcc release 12.1, V12.1.105)
-
MPI: MPICH
Problem
-
Works fine with 14 MPI ranks.
-
With 15 ranks or more, CUDA reports out of memory (OOM) even though
nvidia-smi
shows usage far below 48GB.
Observations
-
With 20 ranks, if only half the ranks allocate memory, OOM still occurs.
-
When splitting into two groups:
-
First 10 ranks allocate memory → GPU memory grows modestly (from ~5412MB to ~8387MB).
-
Second 10 ranks then attempt allocation → immediate OOM.
-
-
Suggests the issue is driver/context overhead or a WSL2 limitation, not physical memory exhaustion.
MPS issue
-
Tried enabling MPS in WSL2:
-
Started
nvidia-cuda-mps-control -d
→control
socket appears. -
No
nvidia-cuda-mps-server
process ever starts. -
server.log
is empty. -
nvidia-smi
never shows “Compute MPS Active”.
-
-
Seems MPS cannot properly start under WSL2.
Questions
-
Why does OOM occur with >14 ranks despite unused GPU memory?
-
Is this a known CUDA/WSL2 limitation (per-context memory reservation, driver behavior, etc.)?
-
Is there a reliable way to allow more MPI ranks to share the GPU under WSL2 (e.g., via MPS)?
-
How can I confirm whether MPS is supported/running in WSL2?
Any insights would be very helpful. Thank you!