Yet another "Driver Not Loaded / can't communicate with the NVIDIA driver" error while trying to deploy a docker container with GPU support on WSL2

Hello.

I’m trying to deploy a docker container with GPU support on Windows Subsystem for Linux.

These are the commands that I have issued (taken from here : https://dilililabs.com/zh/blog/2021/01/26/deploying-docker-with-gpu-support-on-windows-subsystem-for-linux/

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo sh -c 'echo "deb 

http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" > /etc/apt/sources.list.d/cuda.list' sudo apt-get update

sudo apt-get install cuda-toolkit-11-0
curl https://get.docker.com | sh
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list

sudo apt-get update

sudo apt-get install nvidia-docker2 cuda-toolkit-11-0 cuda-drivers

sudo service docker start

I’m not able to run this docker container :

docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04

Unable to find image 'nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04' locally
11.0-cudnn8-devel-ubuntu18.04: Pulling from nvidia/cuda
171857c49d0f: Pull complete
419640447d26: Pull complete
61e52f862619: Pull complete
2a93278deddf: Pull complete
c9f080049843: Pull complete
8189556b2329: Pull complete
c306a0c97a55: Pull complete
4a9478bd0b24: Pull complete
19a76c31766d: Pull complete
Digest: sha256:11777cee30f0bbd7cb4a3da562fdd0926adb2af02069dad7cf2e339ec1dad036
Status: Downloaded newer image for nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

in addition :

root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# nvidia-smi

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Failed to properly shut down NVML: Driver Not Loaded

(I’m using windows 10 build 21376co_release.210503-1432

On the host I have installed the nvidia driver vers. 470.14 and inside WSL2 I have ubuntu 20.04.

Known bug in latest nvidia docker libs. It will be fixed in an upcoming windows driver but for now as a workaround see nvidia-docker 2.6.0-1 - not working on Ubuntu WSL2 · Issue #1496 · NVIDIA/nvidia-docker (github.com)

the workaround is to setup : NVIDIA_DISABLE_REQUIRE=1? what happens if I disable NVIDIA ? which function I will lose ? It does not seem a workaround if I disable the function. I’ve used the standalone version of docker for Windows and it worked. With that I can run correctly the containers with GPU support. This seems to be the real workaround.

No, the workaround is installing the previous libraries version with
sudo apt-get install nvidia-docker2:amd64=2.5.0-1 nvidia-container-runtime:amd64=3.4.0-1 nvidia-container-toolkit:amd64=1.4.2-1 libnvidia-container-tools:amd64=1.3.3-1 libnvidia-container1:amd64=1.3.3-1

NVIDIA_DISABLE_REQUIRE=1 doesn’t disable anything important, it just ignores the CUDA version check. It’s needed because in WSL2 the CUDA version is always incorrectly reported as version 11 by docker.

1 Like

what do u think about Docker for windows ? Does it supports the GPU or not ? I read yes.

I’m using Docker Desktop 3.3.1 and GPU works because it uses older nvidia libraries. You may need NVIDIA_DISABLE_REQUIRE=1 depending of the docker image you are running.

I’ve just downgraded the packages like you have suggested and this is what happened :

root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# nvidia-smi

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Failed to properly shut down NVML: Driver Not Loaded

root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04

docker: Error response from daemon: dial unix /mnt/wsl/docker-desktop/shared-sockets/guest-services/docker.sock: connect: no such file or directory.
See ‘docker run --help’.

UPDATE :

root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# sudo service docker start

  • Starting Docker: docker
    [ OK ] root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04

nothing happened. How can I login into the docker image ?

nvidia-smi is broken and next driver update should fix it.

It looks like you have both the Nvidia docker and Docker Desktop. You can’t use both at the same time. Go to Docker Desktop options RESOURCES → WSL INTEGRATION and disable docker for the WSL2 distro you are running and try again.

like this ?

this is what happens :

root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# sudo service docker start

  • Starting Docker: docker [ OK ]

root@DESKTOP-N9UN2H3:/mnt/c/Program Files/cmder# docker run --rm --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04

nothing.

You can login to the docker image with
docker run --rm -it --gpus all nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 bash

Yes, after terminating and relaunching the running WSL2 distro it should work. What’s the output of ls -la $(which docker) ?

with the bash argument at the end of the command it worked…you are great.

1 Like