I am running multiple Docker containers using the following command, which exposes all GPU devices to each container:
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Scenario:
-
Container C1 runs Process P1, which internally needs to use GPU IDs 1, 2, 3, and 4.
-
Container C2 runs Process P2 at the same time.
Since both containers receive access to all GPU devices, I want to understand the expected behavior:
Question
Is it possible that P2 in C2 also selects GPUs 1, 2, 3, and 4, leading to GPU resource contention, crashes, or failed inference/training runs?
In other words:
-
Does Docker or the NVIDIA runtime enforce per-container GPU isolation automatically?
-
Or do I need to explicitly restrict GPU visibility per container (e.g., using
--gpus '"device=0,1"')? -
Can GPU resource contention occur if two containers try to use the same GPU simultaneously