could not select device driver "" with capabilities: [[gpu]].

Hello,

I’ve followed the steps outlined in GitHub - NVIDIA/nvidia-docker: Build and run Docker containers leveraging NVIDIA GPUs to setup the system and to start an nvidia-docker container.

Running deviceQuery from the cuda samples showed “Detected 1 CUDA Capable device(s)” and all the details of the GPU found.

“nvidi-smi” from host cli verifies card and driver version (418.67)

However, when running “docker run --gpus all nvidia/cuda nvidia-smi”, I get the following error:
“docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled”

I’ve restarted docker as suggested in some troublehooting hints but that didn’t solve it. The other hint I found, was to make sure the graphics card driver was installed, which I “nvidia-smi” verified as far as I understand.

The environment:

  • Ubuntu 19.04 (I’ve seen it’s still beta but I just couldn’t update to 18.10)
  • cuda 10.1
  • Docker 19.03.1

Any help appreciated,
Mike

Please try installing nvidia-container-toolkit and restart docker daemon as instructed here (Should work for Ubuntu 19.04 too):
[url]https://github.com/NVIDIA/nvidia-docker#ubuntu-16041804-debian-jessiestretchbuster[/url]

Related issue and answer:
[url]https://github.com/NVIDIA/nvidia-docker/issues/1034#issuecomment-520282450[/url]

2 Likes

Thanks for your reply, bkakilli. I went through that description but it’s for Ubuntu 18.04. So when I got to…

“curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list”

…it errors out with…

Unsupported distribution!

Check Migration Notice | nvidia-docker

Just did a search on that and found that forcing 18.04 version should work (Workstation Setup for Docker with the New NVIDIA Container Toolkit (nvidia-docker2 is deprecated)). Went through all steps described and it seems OK.

Glad it worked out!

I am able to resolve the error by restarting the demon and the docker
sudo apt install -y nvidia-docker2
sudo systemctl daemon-reload
sudo systemctl restart docker

14 Likes

Hello, so I had this issue but it was resolved after installing Nvidia toolkit, then I appended a new disk memory and started having this issue again? any idea on how to solve this please?

I was able to solve the issue I had following this link 20.04 - Docker only works with Nvidia drivers upon reinstall - Ask Ubuntu
The issue occurred cause I had docker ce and snap docker installed

This issue has popped up again. Ubuntu 20.04, NVidia driver 535.86.05. Driver works on the host. I did not install docker from snap.

$ nvidia-smi 
Mon Sep 18 12:08:34 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A3000 Laptop GPU    Off | 00000000:01:00.0  On |                  N/A |
| N/A   57C    P8              17W /  90W |    160MiB /  6144MiB |     26%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Can not get GPU to work with docker. Have reinstalled docker, reinstalled the nvidia-container-toolkit. No change.

$ docker run --rm  --gpus all ubuntu nvidia-smi
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

The hooks are all in place.

$ ls -al /usr/bin/nvidia-container*
-rwxr-xr-x 1 root root   47472 Sep  7 12:06 /usr/bin/nvidia-container-cli
-rwxr-xr-x 1 root root 3651080 Sep  7 12:07 /usr/bin/nvidia-container-runtime
-rwxr-xr-x 1 root root 2698280 Sep  7 12:07 /usr/bin/nvidia-container-runtime-hook
lrwxrwxrwx 1 root root      38 Sep 20  2022 /usr/bin/nvidia-container-toolkit -> /usr/bin/nvidia-container-runtime-hook

Everything is latest versions:

$ sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Reading package lists... Done
Building dependency tree       
Reading state information... Done
containerd.io is already the newest version (1.6.24-1).
docker-buildx-plugin is already the newest version (0.11.2-1~ubuntu.20.04~focal).
docker-ce-cli is already the newest version (5:24.0.6-1~ubuntu.20.04~focal).
docker-ce is already the newest version (5:24.0.6-1~ubuntu.20.04~focal).
docker-compose-plugin is already the newest version (2.21.0-1~ubuntu.20.04~focal).

$ sudo apt-get install nvidia-container-toolkit
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-container-toolkit is already the newest version (1.14.1-1).

And… let the be a lesson in the proper use of docker context.

Context was set to a remote machine. Ran docker context use default and that fixed the issue.

After installing the TensorRT container, when I try to run the command,

docker run --gpus all -it --rm nvcr.io/nvidia/tensorrt:23.10-py3

It throws an error as

docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]]

Hence used docker run -it --rm nvcr.io/nvidia/tensorrt:23.10-py3 to run the docker container.

My system package versions:

Cuda version - 12.2
Driver version - 535.129.03
TensorRT Container version - 23.10

Tried verifying if the cuda drivers are installed inside the docker container using nvcc - V

Terminal Output:
It shows that cuda drivers are installed inside the docker container.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

nvidia-smi works outside docker container but not inside docker.

Though it shows, driver and cuda versions, when I try to run the Python or C++ program from the docker container. It always shows me cuda module package not found error.

C++ error:

root@82eb7b4cf72d:/workspace/tensorrt/bin# ./sample_onnx_mnist
&&&& RUNNING TensorRT.sample_onnx_mnist [TensorRT v8601] # ./sample_onnx_mnist
[12/01/2023-12:30:41] [I] Building and running a GPU inference engine for Onnx MNIST
[12/01/2023-12:30:41] [W] [TRT] Unable to determine GPU memory usage
[12/01/2023-12:30:41] [W] [TRT] Unable to determine GPU memory usage
[12/01/2023-12:30:41] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 5, GPU 0 (MiB)
[12/01/2023-12:30:41] [W] [TRT] CUDA initialization failure with error: 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
&&&& FAILED TensorRT.sample_onnx_mnist

Python error:

root@82eb7b4cf72d:/workspace/tensorrt/samples/python/introductory_parser_samples# python onnx_resnet50.py
Traceback (most recent call last):
  File "/workspace/tensorrt/samples/python/introductory_parser_samples/onnx_resnet50.py", line 30, in <module>
    import common
  File "/workspace/tensorrt/samples/python/introductory_parser_samples/../common.py", line 25, in <module>
    from cuda import cuda, cudart
ModuleNotFoundError: No module named 'cuda'

Let me know if any of you have faced this error and the fix for this error. Thanks in advance.

yes it was solved also in Centos . :-)