When running containers all gpus are visible in contaner regardless of --gpus option settings

HI.

I’m using Tensorflow2 21.09-tf2-py3 and hpc-banchmarks containers from NGC.

Recently,
I tried to limit the number of gpus in containers for test by using “–gpus” options, but it does not works.
Regardless of “–gpus” options with any combinations of GPU selection, all gpus are visible in container.
I tried “–gpus” with GPU numebrs, and UUID, NVIDIA_VIDIBLE_DEVICES options, but the container always includes all gpus in the node.

How can I limit the number of GPUs in abover containers?

PS. The following run results correctly shows 3 GPUs.
“docker run --rm --gpus 1,2,3 nvidia/cuda:11.0-base nvidia-smi”

Here is my environment.

OS = Ubuntu 18.4 ( 4.15.0-162-generic)
NVIDIA Driver = 470.57.02
CUDA version = 11.4
docker version = 20.10.9, build c2ea9bc (with NVIDIA Docekr)
NGC container

  • Tensorflow2 21.09-tf2-py3
  • hpc-banchmarks

Followiings are script that I used to run tensorflow.

  1. Select all GPUs
    docker run --runtime=nvidia --shm-size=4g --ulimit memlock=-1 -ti --privileged --rm -v $(pwd):/workspace/nvidia-examples/cnn/scripts nvcr.io/nvidia/tensorflow:21.09-tf2-py3

  2. Select one GPU (gpu 0) using “–gpus”
    docker run --runtime=nvidia --gpus 1 --shm-size=4g --ulimit memlock=-1 -ti --privileged --rm -v $(pwd):/workspace/nvidia-examples/cnn/scripts nvcr.io/nvidia/tensorflow:21.09-tf2-py3

  3. Select one GPU (gpu 0) using “-e NVIDIA_VIDIBLE_DEVICES=1”
    docker run --runtime=nvidia -e NVIDIA_VIDIBLE_DEVICES=1 --shm-size=4g --ulimit memlock=-1 -ti --privileged --rm -v $(pwd):/workspace/nvidia-examples/cnn/scripts nvcr.io/nvidia/tensorflow:21.09-tf2-py3

==> each run shows all GPUs in container.

1 Like