Docker pause leads to monopolizing GPU when Volta MPS on

sungin.h · November 2, 2022, 11:59am

Hi, I’m implementing a Persistent Thread style CUDA program on Apache OpenWhisk.

My Persistent Thread Style CUDA process is waiting for a new command via named pipe within the docker container, and then, after one execution the container would be paused.
The problem is no new CUDA process can be started when the container is paused and Volta MPS is on.

What if MPS is off? no problems, a new CUDA process can be executed.
What if send STOP signal to Persistent Thread Style CUDA process running on native env rather than within container? no problems, a new CUDA process can be executed.

Below example is How I tested.
0. environment
Ubuntu 18.04, 5.4.0, x86_64
A100 PCIe 40GB, Driver version 520.61.05 + CUDA 11.8.0 + Volta MPS
Docker version 20.10.7, build 20.10.7-0ubuntu5~18.04.3

Start MPS
$su
#nvidia-cuda-mps-control -d
Run docker
$docker run -it --gpus=all -v /tmp/nvidia-mps --ipc=host ${MyDockerImageFrom_nvidia/cuda:11.8.0-runtime-ubuntu18.04} bash
#mkfifo fifo1 // create named pipe
#./exec.exe // run Persistent Thread Style CUDA process
#echo ${command} > fifo1 // success
Pause the container
$docker pause ${the_container} // with my knowledge, docker pause is based on cgroup freezing but I’m not familiar with cgroup
Run a new CUDA process on native(or within another container shows same problem)
$./run // can not be executed, not appeared on nvidia-smi
If kill the Persistent Thread Style CUDA process
#kill $(pidof exec.exe)
// 4. can start

How can I solve this problem?
Any suggestions will be thankful

Topic		Replies	Views
Multi-Process freeze with docker CUDA Programming and Performance	1	888	August 31, 2023
Suspending NVIDIA-Docker Container Docker and NVIDIA Docker ubuntu	0	1199	December 9, 2020
Unable to start CUDA container with recent update on November 10 Container: CUDA cuda , ubuntu , docker	5	4376	November 21, 2023
pre-volta MPS test failed with error: mapping of buffer object failed CUDA Programming and Performance	3	1187	June 13, 2019
The BlackScholes example runs, but docker throws nvidia-container-cli: initialization error CUDA on Windows Subsystem for Linux	2	1430	June 29, 2021
could not select device driver "" with capabilities: [[gpu]]. Docker and NVIDIA Docker	11	202536	August 12, 2024
MPS client hangs CUDA Programming and Performance	3	1671	October 27, 2022
Docker run stuck with WSL 2 and latest GPU driver Docker and NVIDIA Docker cuda , docker , wsl	1	809	May 6, 2023
cuda kernels from different process can run concurrently? same performance with MPS on and off? CUDA Programming and Performance	9	2106	May 3, 2018
Applications not using GPU inside docker container Docker and NVIDIA Docker	1	1304	May 2, 2024

Docker pause leads to monopolizing GPU when Volta MPS on

Related topics