NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver

onomatopellan · September 11, 2020, 11:47pm

Thanks for the steps. I can’t reproduce it though. Docker works for me following your steps.
What’s your GPU? Do you have more than one? It seems your GPU doesn’t appear in WSL2. Can you post the output of these?

cmd.exe /c ver (just the version build)
ll /usr/lib/wsl/lib/
mount | grep lib
lspci | grep 3D

This is how it looks:

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce GT 710" with compute capability 3.5

> Compute 3.5 CUDA device: [GeForce GT 710]
1024 bodies, total time for 10 iterations: 1.331 ms
= 7.876 billion interactions per second
= 157.520 single-precision GFLOP/s at 20 flops per interaction

chestnut890123 · September 11, 2020, 11:59pm

Thanks for running! Mine is RTX2080ti (device 0) and GT1030 (device 1).

The outcome look like this:

$ cmd.exe /c ver
'\\wsl$\Ubuntu-18.04\home\zw'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported.  Defaulting to Windows directory.

$ ll /usr/lib/wsl/lib/
ls: cannot access '/usr/lib/wsl/lib/': No such file or directory

$ mount | grep lib
/dev/sdb on /var/lib/docker type ext4 (rw,relatime,discard,errors=remount-ro,data=ordered)

$ lspci | grep 3D
(nothing returns)

onomatopellan · September 12, 2020, 12:15am

Yep, definitely none of your GPUs appear in WSL2. What’s your windows build? (winver.exe)

I know SLI needs to be disabled in order for CUDA to work in WSL2. But dunno what to do with 2 GPUS.
You could try something like this and assign one GPU for Ubuntu1804.exe?

When WSL2 detects a GPU it looks like this:

$ ll /usr/lib/wsl/lib/
total 68716
dr-xr-xr-x 1 root root     4096 Sep 11 13:17 ./
drwxr-xr-x 4 root root     4096 Sep 12 01:15 ../
-r--r--r-- 1 root root   124664 Aug 30 15:51 libcuda.so
-r--r--r-- 1 root root   124664 Aug 30 15:51 libcuda.so.1
-r--r--r-- 1 root root   124664 Aug 30 15:51 libcuda.so.1.1
-r--r--r-- 2 root root   832936 Sep  5 15:41 libd3d12.so
-r--r--r-- 2 root root  5046944 Sep  5 15:41 libd3d12core.so
-r--r--r-- 2 root root 22716112 Sep  5 15:41 libdirectml.so
-r--r--r-- 2 root root   878768 Sep  5 15:41 libdxcore.so
-r--r--r-- 1 root root 40496936 Aug 30 15:51 libnvwgf2umx.so

chestnut890123 · September 12, 2020, 12:35am

My winver is 2004 (OS Build 19041.508).

I set GT1030 as my display card (connected to monitor) and 2080ti as computation card (just attached to PCIe slot).

I tried to assign GPU to both ubuntu and wsl, but it only allows for gt1030 while both of my cards are recognized. But even after I chose gt1030, it still gave me the error:

$ ll /usr/lib/wsl/lib/
ls: cannot access '/usr/lib/wsl/lib/': No such file or directory

onomatopellan · September 12, 2020, 12:40am

Build 19041.508

ouch. That was the problem.
You need to be in a Windows 10 Insider Dev build (like build 20211) in order to use GPU inside WSL2.

chestnut890123 · September 12, 2020, 1:15am

Aha! I see, will download the Dev Build.
Thanks so much for following up and will keep you posted if I got the answer.

chestnut890123 · September 13, 2020, 12:19pm

After I installed the insider windows, everything works great! Thanks again for your help!

djmedite · November 30, 2020, 8:14am

nvidia-utils-450: /usr/bin/nvidia-smi

onomatopellan · December 7, 2020, 11:40am

That means you have installed the Nvidia driver inside WSL2.
For a working CUDA in WSL2 you need to uninstall it with sudo apt remove nvidia-driver-450

djmedite · December 7, 2020, 2:10pm

I’ve done that but the command “nvidia --smi” is also bringing “nvidia: command not found”

onomatopellan · December 7, 2020, 2:32pm

The correct command is nvidia-smi or you can also call the Windows version from bash with nvidia-smi.exe

asundaram · December 7, 2020, 8:39pm

Hi djmedite

As our user guide denotes there should be no nvidia-smi packaged inside the WSL2 Linux. If you have /usr/bin/nvidia-smi, then that indicates you have an incorrect installation. nvidia --smi is an incorrect command but as mentioned if you did nvidia-smi from inside WSL2 that must also say “nvidia-smi: command not found”

lkeyes · March 4, 2021, 12:08am

I had a similar issue in WSL2. I found that
$ nvidia-smi.exe
worked within the ubuntu command line environment but ‘nvidia-smi’ gave me that error.