Error starting up CuOpt container

ryanmelnick · May 12, 2022, 9:34pm

I receive an error when trying to run the container: see below. Looking for some guidance to diagnose this. Thanks.

docker run --entrypoint /bin/bash  --network=host -it --gpus all --rm nvcr.io/ea-reopt-member-zone/ea-cuopt

output…
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: failed to add device rules: open /sys/fs/cgroup/devices/user.slice/devices.allow: permission denied: unknown.

user162039 · May 13, 2022, 1:22pm

@ryanmelnick

I’m curious, since the error ends with “permission denied”. Do you have sudo privileges in this environment? If you run the command with “sudo” does it work?

sudo docker run --entrypoint /bin/bash  --network=host -it --gpus all --rm nvcr.io/ea-reopt-member-zone/ea-cuopt

What kind of system are you running on? (OS, cloud platform or local, etc)

ryanmelnick · May 13, 2022, 2:59pm

running it on ubuntu in google cloud platform.

Sudo is not better, it has issues finding the image. I’ve also given the file in question read and write access to all without any luck.

ryanmelnick · May 13, 2022, 7:08pm

I was able to work around this issue. But thank you

user162039 · May 13, 2022, 7:37pm

@ryanmelnick glad to hear it! Can you elaborate on the fix, for the sake of other forum members that might hit this issue?

I’ll also post what I just did here for others, setting up a fresh Ubuntu box on GCP. Here are the steps I followed and links to the instructions:

System: ubuntu 22.04 with a Tesla T4 GPU, 128 GB disk

Things I did with relevant command history included:

Installed the CUDA toolkit based on instructions at CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer

I used the “deb(local)” method and cut-and-paste

    4  sudo apt install gcc
    5  lspci | grep -i nvidia
   11  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
   12  sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
   13  wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-ubuntu2204-11-7-local_11.7.0-515.43.04-1_amd64.deb
   14  sudo dpkg -i cuda-repo-ubuntu2204-11-7-local_11.7.0-515.43.04-1_amd64.deb
   15  sudo cp /var/cuda-repo-ubuntu2204-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
   16  sudo apt-get update
   17  sudo apt-get -y install cuda

Checked that I now had working drivers:

   18  nvidia-smi

Installed docker based on instructions at Installing Docker and The Docker Utility Engine for NVIDIA GPUs — NVIDIA AI Enterprise documentation (these instructions are also other places)

   20  sudo apt-get update
   21  sudo apt-get install -y     apt-transport-https     ca-certificates     curl     gnupg-agent     software-properties-common
   22  curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
   23  sudo apt-key fingerprint 0EBFCD88
   24  sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
   27  sudo apt-get update
   28  sudo apt-get install -y docker-ce docker-ce-cli containerd.io
   29  sudo docker run hello-world

Installed nvidia-container-toolkit with additional instructions here Installing Docker and The Docker Utility Engine for NVIDIA GPUs — NVIDIA AI Enterprise documentation

   32  distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
   33  curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
   34  curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
   35  sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
   36  sudo systemctl restart docker
   37  sudo docker run --gpus all nvidia/cuda:11.0-base nvidia-smi

Added my user to the docker group, logged into ngc and ran cuOpt

   40  sudo usermod -aG docker $USER

   # here you need to logout and log back in

   43  docker login nvcr.io
   44  docker run --entrypoint /bin/bash  --network=host -it --gpus all --rm nvcr.io/ea-reopt-member-zone/ea-cuopt

ryanmelnick · May 13, 2022, 11:21pm

I do not have the details of the fix unfortunately.

But thank you for the detail, this is really excellent. I followed your instructions which is very close to what we have. The only difference is our system is…

ubuntu 20.04
Tesla T4
rootless docker (not docker-ce)

On line 32 my distribution is 20.02, which is correct.
But on line 34 the distribution that ends up in my nvidia-docker.list is ubuntu 18.04. Is that correct behavior?

Then running on line 37 i get the same driver error message:

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled

user162039 · May 16, 2022, 2:51pm

@ryanmelnick

okay, I’m trying a 20.04 as well to see if I can reproduce …

but yes, even on the 22.04 instance that worked, my nvidia-docker.list also shows ubuntu18.04

user162039 · May 16, 2022, 4:37pm

@ryanmelnick

I tried on 20.04, with both docker and docker-ce (and both nvidia-container-toolkit and nvidia-docker2 packages), and I can’t reproduce :(

Here is the relevant history from my latest attempt, with docker and nvidia-docker2

    1  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
    2  sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
    3  wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-ubuntu2004-11-7-local_11.7.0-515.43.04-1_amd64.deb
    4  sudo dpkg -i cuda-repo-ubuntu2004-11-7-local_11.7.0-515.43.04-1_amd64.deb
    5  sudo cp /var/cuda-repo-ubuntu2004-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/
    6  sudo apt-get update
    7  sudo apt-get -y install cuda
    8  nvidia-smi
   11  sudo apt-get install docker.io
   14  curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
   15  curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
   16  sudo apt-get update && sudo apt-get install -y nvidia-docker2
   17  sudo systemctl restart docker
   18  sudo docker run --gpus all nvidia/cuda:11.0-base nvidia-smi
   21  sudo usermod -aG docker $USER
   22  exit
   23  docker login nvcr.io
   24  docker run --entrypoint /bin/bash  --network=host -it --gpus all --rm nvcr.io/ea-reopt-member-zone/ea-cuopt

Not sure what to try next.

user162039 · May 16, 2022, 6:23pm

Ah, nvidia-docker2 installs nvidia-container-toolkit …so it’s essentially the same situation.

The only ways I seem to be able to get the could not select device driver “” with capabilities: [[gpu]] error are

do not install the nvidia-container-toolkit package
or, install the package, but fail to restart docker

@ryanmelnick when you say rootless docker, do you mean something like this

or do you mean simply adding the user to the docker group like I have above?

Update: I switched to rootless docker using this page How to do a Rootless Docker Installation?

At first I got the original error you reported (permission denied for /sys/fs/cgroup/devices/user.slice/devices.allow)
I fixed that by setting “no-groups = true” in /etc/nvidia-container-runtime/config.toml under [nvidia-container-cli]

After that I was able to run the container using rootless docker.

Topic		Replies	Views
Could not select device driver "" with capabilities [[gpu]] cuOpt	2	1799	May 13, 2022
could not select device driver "" with capabilities: [[gpu]]. Docker and NVIDIA Docker	13	215712	July 3, 2025
Optix 7.5-8.0 fails inside docker but works on host OptiX cuda , docker , gpu	1	713	January 31, 2024
[BUG] target-docker-container running cuda-samples require unintended extra permission DRIVE AGX Orin General docker , status_answered , c_documentation_gap , a_not_needed , s_np	12	1559	May 30, 2023
GPU not detected in custom devcontainer with nvcr.io/nvidia/pytorch:23.05-py3 CUDA Setup and Installation	5	1442	June 20, 2023
Docker container cant use GPU cuDNN tensorflow , docker , python , gpu	1	4548	July 1, 2022
Failed to initialize NVML: Unknown Error when running nvidia-smi on Docker container CUDA Programming and Performance cuda , ubuntu , docker	2	10780	October 18, 2020
ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU functionality will not be available CUDA Setup and Installation cuda	2	307	July 5, 2025
Individual cuOpt python library access cuOpt	4	711	April 22, 2022
Help with rootless podman or rootless docker and nvidia GPU CUDA Setup and Installation	2	8260	August 17, 2023

Error starting up CuOpt container

Installed the CUDA toolkit based on instructions at CUDA Toolkit 11.7 Update 1 Downloads | NVIDIA Developer

Checked that I now had working drivers:

Installed docker based on instructions at Installing Docker and The Docker Utility Engine for NVIDIA GPUs — NVIDIA AI Enterprise documentation (these instructions are also other places)

Installed nvidia-container-toolkit with additional instructions here Installing Docker and The Docker Utility Engine for NVIDIA GPUs — NVIDIA AI Enterprise documentation

Added my user to the docker group, logged into ngc and ran cuOpt

Related topics