Docker and nvidia-smi not working with clean install on Driver 470.14 and Insider Preview (Build 21343) Ubuntu 20.04

Windows Insider Preview Build 2143.rs_prerelease.210320-1757
Nvidia driver 470.14
GeForce RTX 3090
Using WSL2
Cuda toolkit 11.2

dpkg -l | grep nvidia
ii  libnvidia-container-tools       1.3.3-1                           amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64      1.3.3-1                           amd64        NVIDIA container runtime library
ii  nvidia-container-runtime        3.4.2-1                           amd64        NVIDIA container runtime
ii  nvidia-container-toolkit        1.4.2-1                           amd64        NVIDIA container runtime hook
ii  nvidia-docker2                  2.5.0-1                           all          nvidia-docker CLI wrapper

I followed the instructions at: CUDA on WSL :: CUDA Toolkit Documentation with a fresh WSL2 and ubuntu setup, and followed the toolkit installation instructions found here: CUDA Toolkit 11.2 Update 2 Downloads | NVIDIA Developer except for the last step. Instead of just installing cuda (my impression was that we didn’t want to do that on WSL?) I ran sudo apt-get install cuda-toolkit-11-2.

Happy to provide more info. It’s disappointing that this isn’t working since I was hoping to actually do some serious work on my windows install using the new WSL2 docker gpu support.

Main problems:
Running the docker GPU example does not work:

sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
[sudo] password for nivintw:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.2, please update your driver to a newer version, or use an earlier cuda container: unknown.
ERRO[0000] error waiting for container: context canceled

nvidia-smi does not work (following instructions for coping and updating permissions):

nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Failed to properly shut down NVML: Driver Not Loaded

Further info:
I was able to launch the tensorflow container in the documentation and allocate tensors on the GPU, so that container seems to be functional, although the following code took MUCH longer than expected (minutes) to execute:

a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.random.uniform(shape=[3,2])
c = tf.matmul(a, b)

print(c)

Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Sub in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Mul in device /job:localhost/replica:0/task:0/device:GPU:0
Executing op Add in device /job:localhost/replica:0/task:0/device:GPU:0
tf.Tensor(
[[2.872214  3.5481367]
 [6.7865934 8.095843 ]], shape=(2, 2), dtype=float32)
7 Likes

+1 would love to know if you found a solution

1 Like

+2

The requirement error It’s a known bug in latest Docker. Try something like:

docker run --gpus all --env NVIDIA_DISABLE_REQUIRE=1 nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark