Windows Insider Preview Build 2143.rs_prerelease.210320-1757
Nvidia driver 470.14
GeForce RTX 3090
Cuda toolkit 11.2
dpkg -l | grep nvidia ii libnvidia-container-tools 1.3.3-1 amd64 NVIDIA container runtime library (command-line tools) ii libnvidia-container1:amd64 1.3.3-1 amd64 NVIDIA container runtime library ii nvidia-container-runtime 3.4.2-1 amd64 NVIDIA container runtime ii nvidia-container-toolkit 1.4.2-1 amd64 NVIDIA container runtime hook ii nvidia-docker2 2.5.0-1 all nvidia-docker CLI wrapper
I followed the instructions at: CUDA on WSL :: CUDA Toolkit Documentation with a fresh WSL2 and ubuntu setup, and followed the toolkit installation instructions found here: CUDA Toolkit 11.2 Update 2 Downloads | NVIDIA Developer except for the last step. Instead of just installing
cuda (my impression was that we didn’t want to do that on WSL?) I ran
sudo apt-get install cuda-toolkit-11-2.
Happy to provide more info. It’s disappointing that this isn’t working since I was hoping to actually do some serious work on my windows install using the new WSL2 docker gpu support.
Running the docker GPU example does not work:
sudo docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark [sudo] password for nivintw: docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.2, please update your driver to a newer version, or use an earlier cuda container: unknown. ERRO error waiting for container: context canceled
nvidia-smi does not work (following instructions for coping and updating permissions):
nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. Failed to properly shut down NVML: Driver Not Loaded
I was able to launch the tensorflow container in the documentation and allocate tensors on the GPU, so that container seems to be functional, although the following code took MUCH longer than expected (minutes) to execute:
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) b = tf.random.uniform(shape=[3,2]) c = tf.matmul(a, b) print(c) Executing op RandomUniform in device /job:localhost/replica:0/task:0/device:GPU:0 Executing op Sub in device /job:localhost/replica:0/task:0/device:GPU:0 Executing op Mul in device /job:localhost/replica:0/task:0/device:GPU:0 Executing op Add in device /job:localhost/replica:0/task:0/device:GPU:0 tf.Tensor( [[2.872214 3.5481367] [6.7865934 8.095843 ]], shape=(2, 2), dtype=float32)