470.14 - WSL with W10 Build 21343 - NVIDIA-SMI error

Dear Sir,

I did that and updated the kernel to 5.10.16. But the error:
Error: only 0 Devices available, 1 requested. Exiting.
Persists.

I’ve seen that the ERR! appearing in the nvidia-smi might be “normal” as in some tutorials it also appears:

I get the very same output (but changing the GPU card reference to RTX 2060)
nvidia-smi output

Anyone giving me a helping hand?

1 Like

Thanks for the link to the driver. - I installed them and I got a similar behavior.

  • From a WSL Ubuntu shell, running nvidia-smi:
epinux@DESKTOP-TL2DFPU:/mnt/c/Users/massi$ nvidia-smi
Sun May 16 01:59:47 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.00       Driver Version: 465.21       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:0A:00.0  On |                  N/A |
| 29%   42C    P8    14W / 151W |    618MiB /  8192MiB |    ERR!      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And running the nvidia-smi.exe (which should run on the windows side):

epinux@DESKTOP-TL2DFPU:/mnt/c/Users/massi$ nvidia-smi.exe
Sun May 16 01:59:50 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.21       Driver Version: 465.21       CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070   WDDM  | 00000000:0A:00.0  On |                  N/A |
| 29%   42C    P8    14W / 151W |    618MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2024    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A      3748    C+G   ...8bbwe\WindowsTerminal.exe    N/A      |
|    0   N/A  N/A      3844    C+G   ...ontend\Docker Desktop.exe    N/A      |
|    0   N/A  N/A      4608    C+G   ...me\Application\chrome.exe    N/A      |
|    0   N/A  N/A      5304    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A      5940    C+G   C:\Windows\explorer.exe         N/A      |
|    0   N/A  N/A      8088    C+G   ...perience\NVIDIA Share.exe    N/A      |
|    0   N/A  N/A      8152    C+G   ...artMenuExperienceHost.exe    N/A      |
|    0   N/A  N/A     17164    C+G   ...e\Current\LogiOverlay.exe    N/A      |
+-----------------------------------------------------------------------------+
epinux@DESKTOP-TL2DFPU:/mnt/c/Users/massi$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Error: only 0 Devices available, 1 requested.  Exiting.
epinux@DESKTOP-TL2DFPU:/mnt/c/Users/massi$

You may notice the nidia-smi version mismatch:

  • nvidia-smi
| NVIDIA-SMI 470.00       Driver Version: 465.21       CUDA Version: 11.3
  • nvidia-smi.exe
| NVIDIA-SMI 465.21       Driver Version: 465.21       CUDA Version: 11.3

I will try to perform a clean re-install of the whole WSL system and report back.

Tried with ubuntu 18.04 and 16.04 with similar results:
“Error: only 0 Devices available, 1 requested. Exiting.”

I also spotted the different versions as you did, looked around and guessed that it is not very important.

During the nvidia docker install (sudo apt-get install nvidia-docker2) I also got an issue with a symbolic link to libcuda.so.1. I fixed it with mklink in windows host, But I guess that it isn’t really important. if you do a sudo ldconfig you will see the warning for the symbolic link (that disappears with the mklink) but it does not actually fix anything.

I was wondering if the issue is the the libcuda,so driver. maybe any of you have different versions from mine so that I could try a dirty file swapping

Directory of C:\Windows\System32\lxss\lib
16/05/2021 12:53 .
16/12/2020 02:32 133,088 libcuda.so
16/05/2021 12:53 libcuda.so.1 [libcuda.so]
12/05/2021 12:02 785,608 libd3d12.so
12/05/2021 12:02 5,399,104 libd3d12core.so
12/05/2021 12:02 827,904 libdxcore.so
18/03/2021 05:41 6,053,064 libnvcuvid.so.1
18/03/2021 05:41 424,440 libnvidia-encode.so.1
16/12/2020 02:32 192,160 libnvidia-ml.so.1
18/03/2021 05:41 354,808 libnvidia-opticalflow.so.1
16/12/2020 02:32 48,606,768 libnvwgf2umx.so
18/03/2021 05:41 670,104 nvidia-smi

@epifanio Just to make clear that the problems I have are related to the use of nvidia-docker. The GPU is working with the cuda samples from the WSL2 Ubuntu-18.04. I mean when I make (sudo make) the cuda samples from the /usr/local/cuda/samples, and then I try the ./BlackScholes it runs on the GPU (but without any nvidia docker)

Moreover when I try the jupyter notebook example (sudo docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter) it does not show any GPU devices from tensorflow. In any of the examples I add a new cell with the following commands:
import tensorflow as tf
tf.config.list_physical_devices()
=> returns
[PhysicalDevice(name=‘/physical_device:CPU:0’, device_type=‘CPU’),
PhysicalDevice(name=‘/physical_device:XLA_CPU:0’, device_type=‘XLA_CPU’)]

Really? I got Docker Desktop ver 3.3.3(64133), gpu on docker is not working with following output

docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.

@xinglinqiang Try uninstalling Docker Desktop 3.3.3 and install version 3.3.1 from https://desktop.docker.com/win/stable/amd64/63152/Docker%20Desktop%20Installer.exe?utm_source=docker&utm_medium=webreferral&utm_campaign=docs-driven-download-win-amd64
That’s the only version of Docker Desktop that works for me.

@onomatopellan The latest Nvidia dev driver 470.14 or any roll back version?

Driver is 470.25 that comes from Windows Update for some users. I see you fixed the problem installing DD 3.3.1 and rebooting Windows. Let’s see if that helps more people with the same issue.

@onomatopellan
After the downgrade, although torch.cuda.is_available() & tf.config.list_physical_devices('GPU') return the actual GPU and the testing program worked perfectly (Blackscholes), the normal python code would freeze forever. Still need to find a workaround.

@xinglinqiang Let’s forget about Docker Desktop and let’s try to fix nvidia docker again. Quit Docker Desktop and in WSL2 update to the latest nvidia-docker2 (where libnvidia-container1 is version 1.4.0) and after seeing the “driver:error”, download and install this patched version with sudo apt install ./libnvidia-container1_1.4.0-1_amd64.deb . Does it work now?

libnvidia-container1_1.4.0-1_amd64.deb (65.6 KB)

2 Likes

Docker Desktop 3.3.0 works correctly. Anything higher than that crashes with the error you describe. See my post here. Uninstall 3.3.3, reboot, then re-install 3.3.0. And make sure you do not upgrade docker after that:

2 Likes

Does anyone else also not have the vidia pci ids listed with lspci | grep -i nvidia. Even after updating with update-pciids.

@liamcarp22 That’s normal. Since what WSL2 actually sees is a GPU abstraction and not the real GPU you will always see 3D controller: Microsoft Corporation Device 008e in lspci.

I see. Threw me off becuase the tutorial implied that I should expect an output even on WSL2. If I don’t want to wait for this bug to be fixed, how would I go about rolling back the wsl2 driver?

Thanks

I reverted the driver to 465.21 everything works ok.
Still, i think we need to find the meta package of nvidia-utils-470 to fix the error of driver linking.

2 Likes

@AKAMolasses The “driver error” is misleading. The culprit is actually libnvidia-container which was changed expecting that the latest nvidia windows driver already would have NVML support. But it didn’t and that’s why NVIDIA-SMI doesn’t’ work either.

NVIDIA-SMI error aside, the “driver error” can be fixed temporally just installing my patched libnvidia-container from my previous post.

Check your installed versions of nvidia-smi, I have the same issue and I have pointers to multiple nvidia-smi paths in my wsl and windows mount.

So is this normal?
Laptop is using Intel HD 630 and Nvidia 1060, but can’t find it in wsl2 with Nvidia CUDA v471.21 on Ubuntu 20.04

But if I try with docker, output like this, my nvidia detected:

@projectoneuniverseid Yes, it’s normal. You only need to upgrade your Mesa libs and it will show your GPU name with glxinfo -B.
To upgrade Mesa:

sudo add-apt-repository ppa:kisak/kisak-mesa

sudo apt update && sudo apt upgrade -y
1 Like

Yes, now detected, thanks a bunch

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.