Stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

dengzhiyong233 · June 20, 2020, 1:17pm

I am following CUDA on WSL :: CUDA Toolkit Documentation and https://devblogs.nvidia.com/announcing-cuda-on-windows-subsystem-for-linux-2/ to setup the cuda docker.
However, when I run this “docker run --gpus all nvcr.io/nvidia/k8s/cuda-sam… nbody -gpu -benchmark”
I am facing the following error:

ERRO[2020-06-19T15:32:33.365111100-04:00] stream copy error: reading from a closed fifo
ERRO[2020-06-19T15:32:33.365119800-04:00] stream copy error: reading from a closed fifo
ERRO[2020-06-19T15:32:33.573504400-04:00] e488e831e71fb284dc30ff2ff80e7c9df4f378ec9aca236c6563f1dd505f7c79 cleanup: failed to delete container from containerd: no such container
ERRO[2020-06-19T15:32:33.573551400-04:00] Handler for POST /v1.40/containers/e488e831e71fb284dc30ff2ff80e7c9df4f378ec9aca236c6563f1dd505f7c79/start returned error: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

Is it because I install WSL2 before the driver? I am sure I have installed the driver.

Thank you!

kmorozov · June 20, 2020, 4:30pm

Thank you for posting to this forum.
It is possible you have a setup issues with the Nvidia Container toolkit.

Do you see any failures in the WSL2 window running docker service?
Could you stop the docker service and perform the following steps:

Setup the stable and experimental repositories and the GPG key. The changes to the runtime to support WSL 2 are available in the experimental repository.


$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
        
Install the NVIDIA runtime packages (and their dependencies) after updating the package listing.


$ sudo apt-get update

$ sudo apt-get install -y nvidia-docker2

Also, it would be helpful if you could check if you have /dev/dxg folder in you WSL2 container (to confirm GPU was added there).

Last but not least, please make sure you installed 455.41+ version of the driver from the download link mentioned in the User Guide.

wuguandejn · June 23, 2020, 4:26pm

Following Same steps and getting same error.

My docker showed the same error. Here is my device and system info:
GTX 1080 eGPU by Thunderbolt,

Geforce Driver 455.41.
Windows build 20150.1000
Linux Kernel: 4.19.121,
Ubuntu 18.04-WSL2
docker: 19.03.6

P_Ramarao · June 23, 2020, 5:06pm

Did you stop and start the Docker service as indicated in the user guide after installing nvidia-docker2?

kmorozov · June 23, 2020, 6:06pm

This is

[quote=“wuguandejn, post:3, topic:128871”]
eGPU by Thunderbolt,
[/quote] configuration. Do you actually see the GPU in the container working?
Could you check if you get /dev/dxg folder there?

eternalphane · June 24, 2020, 6:35pm

You should make sure no other distro is running, see this thread.

wuguandejn · June 25, 2020, 7:05am

I installed CUDA outside docker and it works correctly. (while I still cannot get nvidia-docker work)

rboissel · June 25, 2020, 8:55am

Hello,

If you are still hitting: “driver error: failed to process request”
This means that nvidia-container fall back on the Native Path instead of the WSL path and couldn’t communicate with the Native driver (which at that point is expected)

There are two main causes for that:

Either you have the non-experimental version of nvidia-docker installed (be sure to downloads nvidia-docker using the procedure described in the user guide). Only the experimental version of nvidia-docker can be used on WSL
Or the some components for WSL GPU supports are missing

. Since you mentioned you got it to work outside docker, it is likely the first one.

Also this might be very overkill here, there is a way to dump the debug log from Nvidia container:

Set the environment variable NVC_DEBUG_FILE=/some/file.txt
make sure the file exists and with the right permissions beforehand
Start dockerd manually (if you use sudo, don’t forget to use sudo -E for the env var)
Run the problematic container
The file should now have the logs. If there is no mention of WSL, dxcore or fallback you likely have the non-experimental version of nvidia-container. You will need to redo the procedure (maybe some problematic left over files from a previous install) to get the experimental nvidia-container support. If there are some errors related to dxcore you can post the logs here and we can take a look.

Thanks !

dengzhiyong233 · June 25, 2020, 2:37pm

I redo all the step you suggested and have 455.41 version of driver on my windows host system. But I don’t have this folder /dev/dxg.

dengzhiyong233 · June 25, 2020, 2:40pm

This is about Cuda. I think that is a step after the driver problem?

kmorozov · June 25, 2020, 3:22pm

I don’t have this folder /dev/dxg.

This indicates the problem. You don’t have the GPU exposed into the container.
Did you install the WSL2 kernel with Windows Update as described in the user Guide?

Could you show the output of the following commands launched in your WSL2 container:
uname -a
ls /usr/lib/wsl/lib

dengzhiyong233 · June 25, 2020, 3:26pm

$ uname -a
Linux NeilDLaptop 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

ls /usr/lib/wsl/lib
ls: cannot access ‘/usr/lib/wsl/lib’: No such file or directory

kmorozov · June 25, 2020, 3:30pm

You do not have the right Linux kernel that support GPU.
Please follow the Windows Update section of the User Guide: CUDA on WSL :: CUDA Toolkit Documentation

dengzhiyong233 · June 25, 2020, 4:09pm

Got you! Thank you for your help. I will update to the latest one.

dengzhiyong233 · June 25, 2020, 6:39pm

I update the Windows version but I am facing another error"stat failed: /dev/dxg: no such file or directory"
while runing this "docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark "

uname -a
Linux NeilDLaptop 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

ls /usr/lib/wsl/lib
libcuda.so libcuda.so.1 libcuda.so.1.1 libd3d12.so libdirectml.so libdxcore.so

Digest: sha256:aaca690913e7c35073df08519f437fa32d4df59a89ef1e012360fbec46524ec8
Status: Downloaded newer image for nvcr.io/nvidia/k8s/cuda-sample:nbody
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: stat failed: /dev/dxg: no such file or directory\\\\n\\\"\"": unknown.
ERRO[0034] error waiting for container: context canceled

kmorozov · June 25, 2020, 7:30pm

Linux NeilDLaptop 4.19.104-microsoft-standar

You still do not have have the right kernel installed. You need the kernel version 4.19.121+.
I hope you enabled the Optional updates in Windows Updates settings.

You can also try to run: wsl --update command on your Windows host.

dengzhiyong233 · June 25, 2020, 8:29pm

Thank you kmorozov, by updating all the optional update, I make it work. I should pay extra attention to the instrution about the kernel version initially.

And one more question, I see somewhere saying there will be performance hurt using GPU on WSL2? If there is a hurt, I can install a Linux dual system alternatively. And Use WSL for prototyping.

kmorozov · June 26, 2020, 7:20pm

Some performance issues are known in WSL2 due to the GPU paravirtualization used to deliver the GPU hardware inside the WSL2 container (please see the NVIDIA CUDA WSL blog for more details). However, the actual numbers would really depend on the workload. Generally for GPU bound applications the performance difference with native runs is expected to be significantly lower. There are some corner cases, of course, that need to be looked on case by case basis.

seamans · September 24, 2020, 5:51pm

I’m having similar issues. Started over with a fresh Ubuntu 20.04 WSL 2 installation:
uname -r
4.19.128-microsoft-standard

Installed the Correct NVIDIA-SMI driver 460.20

±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.20 Driver Version: 460.20 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208… WDDM | 00000000:53:00.0 On | N/A |
| 24% 55C P3 65W / 260W | 2506MiB / 11264MiB | 17% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

Was able to build the BlackScholes example without errors. When I try to rut it:

[./BlackScholes] - Starting…
CUDA error at …/…/common/inc/helper_cuda.h:777 code=35(cudaErrorInsufficientDriver) “cudaGetDeviceCount(&device_count)”

I haven’t tried with the container(s) yet. But before the Ubuntu WSL2 re-install I was getting a similar error so I’m assuming until I can fix this that will still be an issue.

After installing the docker, and the nvidia containers per the instructions. Trying to run:

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
results in:
stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""": unknown.

onomatopellan · September 25, 2020, 5:54pm

@seamans What it shows if you type ls -la /dev/dxg from bash?
What’s your Windows build? (run winver.exe)

Topic		Replies	Views
Failure to install CUDA on WSL 2 Ubuntu CUDA on Windows Subsystem for Linux	65	46763	September 10, 2021
470.14 - WSL with W10 Build 21343 - NVIDIA-SMI error CUDA on Windows Subsystem for Linux	43	19044	November 21, 2021
Failed to initialize NVML: Unknown Error when running nvidia-smi on Docker container CUDA Programming and Performance cuda , ubuntu , docker	2	10635	October 18, 2020
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver CUDA on Windows Subsystem for Linux	33	23157	May 1, 2021
Guide to run CUDA + WSL + Docker with latest versions (21382 Windows build + 470.14 Nvidia) CUDA on Windows Subsystem for Linux cuda , wsl	22	34493	December 9, 2023
Modulus 22.07 Container version for Linux issue Report a Bug (PhysicsNeMo Only)	10	2196	August 5, 2022
CUDA sample throwing error CUDA on Windows Subsystem for Linux	46	22988	April 29, 2022
Yet another "Driver Not Loaded / can't communicate with the NVIDIA driver" error while trying to deploy a docker container with GPU support on WSL2 CUDA on Windows Subsystem for Linux	11	5596	May 9, 2021
Nvidia-container-cli error CUDA on Windows Subsystem for Linux	3	6411	May 13, 2021
Now Available: CUDA on WSL Public Preview CUDA on Windows Subsystem for Linux	31	4284	July 8, 2020

Stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

Related topics