Guide to run CUDA + WSL + Docker with latest versions (21382 Windows build + 470.14 Nvidia)

Wow, indeed it works with Docker 3.3.0!
Many, many thanks gurveshsanghera!!!
:-) :-) :-)

You are the best! It works!!!
I actually don’t seem to need the: --env NVIDIA_DISABLE_REQUIRE=1

Thanks a lot. It works.
I do need the: --env NVIDIA_DISABLE_REQUIRE=1

Thanks for the guide! The benchmark is up and working for me when I run from windows, however when I try to run the third example (CUDA on WSL :: CUDA Toolkit Documentation) I get:

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container

Where you able to run that example correctly and get the NVIDIA drivers detected?

Thanks a lot for this guide !!! It really helps!

Hi - I was able to get the tensorflow docker image running - if that’s your question. I checked the GPU was exposed in it, and was able to run some test cases. I was also able to install CUDA natively on the WSL (and also on Windows) and run the BlackScholes example (on both).

I can get the dockers running, but when I use this command:

docker run --gpus all -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tensorflow:20.03-tf2-py3

there is the warning when the container starts and from there I haven’t been able to run the resnet examples that nvidia gave in their tutorial

I think you are missing the --env NVIDIA_DISABLE_REQUIRE=1 flag. You need it for all the docker containers now where you want to use the GPU.

This is the command I ran fyi:

docker run -it --env NVIDIA_DISABLE_REQUIRE=1 --gpus all --name tf1 -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter

Sorry I forgot to include that in the post but when I start my docker I use that, Tensorflow can find the GPU so I guess it can work I just don’t know if ill be able to use tensorrt in the container since it says it can’t find drivers

So - I’ve been thinking - why do we need to run containers at all now? I am also able to get tensorflow running natively on Windows (actually that’s what I have been doing last few days) - since they (and also Pytorch) are natively supporting Windows. And everything seems to work just fine.

Maybe using via the WSL Docker just adds too much complexity…

Anyway… all the best!

1 Like

I’m actually going to be running container in the cloud, so having same environment local is a huge win for me. I did go through the steps in original post and everything worked from GPU perspective! However I had to roll back because the latest windows build caused my system to hang and crash every 5 minutes or so. :( so i’m back to square one.

Thanks you very much!

Thanks, installing 3.0.0 did the trick.

For anyone who, after running the above docker run command in WSL2 got this error (like I did):

permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.39/containers/json: dial unix /var/run/docker.sock: connect: permission denied

The reason is that the user running the command is not a member of the docker group. To add yourself to the docker group, run:

sudo usermod -a -G docker $USER

Then logout and log back in.

After logging back in, confirm you are a member of the docker group by typing:

$ groups

You should see “docker” in the list displayed.

Otherwise, worked great! Just remember, if you copy & paste docker CUDA sample command lines from the NVIDIA pages, remember to add --env NVIDIA_DISABLE_REQUIRE=1 to the line you copied.

2021-06-27 the latest version Docker Desktop v3.5.1 (66090) works!

1 Like

Docker Desktop 3.5.1 66090 worked. Thanks!

Hi - to anyone reading this now - install Docker Desktop 3.5.1.66090 or higher directly. I have verified with both 3.6 and 4.0 - which both seem to work fine, and also no longer require the flag --env NVIDIA_DISABLE_REQUIRE=1.

Hi, I have tested docker 4.0 with win11, it works.

But I can’t use any cuda image from hub.
I have tried:

docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:21.08-py3 bash
docker run --gpus all -it --rm nvidia/cuda:11.4.1-devel-ubuntu18.04 bash
docker run --gpus all -it --rm nvidia/cuda:10.2-devel-ubuntu18.04 bash

All of these can’t run nvidia-smi.
And there are some error messages shown in the container by nvcr.io/nvidia/pytorch:21.08-py3:

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --ipc=host ...

Which deep learning images can be used?

Hello, everyone. For the question nvidia-smi can’t use in nvcr.io/nvidia/pytorch:21.08-py3, there maybe some bug in docker for windows.
I have tried the method in CUDA on WSL :: CUDA Toolkit Documentation (nvidia.com) to install nvidia-docker in wsl2. It works.
When you use nvidia-docker, attention:

When running the NGC Deep Learning (DL) Framework GPU containers in WSL 2, you may encounter a message:
The NVIDIA Driver was not detected.  GPU functionality will not be available.
                
Note that this message is an incorrect warning for WSL 2 and will be fixed in future releases of the DL Framework containers to correctly detect the NVIDIA GPUs. The DL Framework containers will still continue to be accelerated using CUDA on WSL 2.

I installed docker-desktop4.24.06 in Windows 10 19042.928. The graphics card driver is 537.42 and the following error is reported. How should I solve it?

The image has been installed without any problems
docker run --gpu all

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.