command "docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi" fails with Error

We are trying to nstall Nvidia docker using the steps listed in the link below:
https://github.com/NVIDIA/nvidia-docker

We have verified that the CUDA 10.0 is installed correctly as can be seen in the below dump from the terminal

Before we executed the below command we ran the command with Cuda 10.0 as base i.e., docker run --runtime=nvidia --rm nvidia/cuda:10.0-base nvidia-smi (Got the same error for this as well)

When we execute docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi we are getting the error as below:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused "process_linux.go:424: container init caused "process_linux.go:407: running prestart hook 1 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=6850 /var/lib/docker/overlay2/2cc9923c80fe972aeea84a85176845fc1cb2ad367161665faf5c4e00fbc4d966/merged]\\nnvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\n\""": unknown.

What is the solution for fixing this error ?


riaz@riaz-X705UDR:~ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 riaz@riaz-X705UDR:~ docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
riaz@riaz-X705UDR:~ sudo apt-get purge -y nvidia-docker Reading package lists... Done Building dependency tree Reading state information... Done Package 'nvidia-docker' is not installed, so not removed The following packages were automatically installed and are no longer required: libbsd0:i386 libdrm-amdgpu1:i386 libdrm-intel1:i386 libdrm-nouveau2:i386 libdrm-radeon1:i386 libdrm2:i386 libedit2:i386 libelf1:i386 libexpat1:i386 libffi6:i386 libgl1:i386 libgl1-mesa-dri:i386 libglapi-mesa:i386 libglvnd0:i386 libglx-mesa0:i386 libglx0:i386 libllvm6.0:i386 libnvidia-common-390 libpciaccess0:i386 libsensors4:i386 libstdc++6:i386 libwayland-client0:i386 libwayland-server0:i386 libx11-6:i386 libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386 libxcb-dri3-0:i386 libxcb-glx0:i386 libxcb-present0:i386 libxcb-sync1:i386 libxcb1:i386 libxdamage1:i386 libxdmcp6:i386 libxext6:i386 libxfixes3:i386 libxshmfence1:i386 libxxf86vm1:i386 Use 'sudo apt autoremove' to remove them. 0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded. riaz@riaz-X705UDR:~ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \

sudo apt-key add -
OK
riaz@riaz-X705UDR:~ distribution=(. /etc/os-release;echo $ID$VERSION_ID)
riaz@riaz-X705UDR:~$ curl -s -L https://nvidia.github.io/nvidia-docker/distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list deb https://nvidia.github.io/libnvidia-container/ubuntu18.04/(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/(ARCH) / deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/(ARCH) /
riaz@riaz-X705UDR:~ sudo apt-get update Get:1 file:/var/cuda-repo-10-0-local-10.0.130-410.48 InRelease Ign:1 file:/var/cuda-repo-10-0-local-10.0.130-410.48 InRelease Get:2 file:/var/cuda-repo-10-0-local-10.0.130-410.48 Release [574 B] Hit:3 https://repo.skype.com/deb stable InRelease Get:2 file:/var/cuda-repo-10-0-local-10.0.130-410.48 Release [574 B] Ign:4 http://dl.google.com/linux/chrome/deb stable InRelease Hit:5 https://download.docker.com/linux/ubuntu bionic InRelease Hit:6 http://in.archive.ubuntu.com/ubuntu bionic InRelease Get:7 http://security.ubuntu.com/ubuntu bionic-security InRelease [83.2 kB] Hit:8 http://dl.google.com/linux/chrome/deb stable Release Hit:9 https://nvidia.github.io/libnvidia-container/ubuntu18.04/amd64 InRelease Get:11 http://in.archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB] Hit:12 https://nvidia.github.io/nvidia-container-runtime/ubuntu18.04/amd64 InRelease Hit:13 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64 InRelease Get:14 http://in.archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB] Fetched 247 kB in 2s (107 kB/s) Reading package lists... Done riaz@riaz-X705UDR:~ sudo apt-get install -y nvidia-docker2
Reading package lists… Done
Building dependency tree
Reading state information… Done
nvidia-docker2 is already the newest version (2.0.3+docker18.09.1-1).
The following packages were automatically installed and are no longer required:
libbsd0:i386 libdrm-amdgpu1:i386 libdrm-intel1:i386 libdrm-nouveau2:i386
libdrm-radeon1:i386 libdrm2:i386 libedit2:i386 libelf1:i386 libexpat1:i386
libffi6:i386 libgl1:i386 libgl1-mesa-dri:i386 libglapi-mesa:i386
libglvnd0:i386 libglx-mesa0:i386 libglx0:i386 libllvm6.0:i386
libnvidia-common-390 libpciaccess0:i386 libsensors4:i386 libstdc++6:i386
libwayland-client0:i386 libwayland-server0:i386 libx11-6:i386
libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386 libxcb-dri3-0:i386
libxcb-glx0:i386 libxcb-present0:i386 libxcb-sync1:i386 libxcb1:i386
libxdamage1:i386 libxdmcp6:i386 libxext6:i386 libxfixes3:i386
libxshmfence1:i386 libxxf86vm1:i386
Use ‘sudo apt autoremove’ to remove them.
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
riaz@riaz-X705UDR:~ sudo pkill -SIGHUP dockerd riaz@riaz-X705UDR:~ docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:344: starting container process caused “process_linux.go:424: container init caused “process_linux.go:407: running prestart hook 1 caused \“error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --device=all --compute --utility --require=cuda>=9.0 --pid=6850 /var/lib/docker/overlay2/2cc9923c80fe972aeea84a85176845fc1cb2ad367161665faf5c4e00fbc4d966/merged]\\nnvidia-container-cli: initialization error: cuda error: no cuda-capable device is detected\\n\”””: unknown.
riaz@riaz-X705UDR:~$

It looks to me like either the GPU driver is not properly installed, or else you have no CUDA capable GPUs in your system.

A proper nvidia docker plugin installation starts with a proper CUDA install on the base machine.

nvcc --version

is not sufficient to verify a proper install.

You may wish to follow the sequence here:

https://docs.nvidia.com/ngc/ngc-titan-setup-guide/index.html

and verify your CUDA install using the methods provided in the linux install guide:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions