Stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

I am following https://docs.nvidia.com/cuda/wsl-user-guide/index.html and https://devblogs.nvidia.com/announcing-cuda-on-windows-subsystem-for-linux-2/ to setup the cuda docker.
However, when I run this “docker run --gpus all nvcr.io/nvidia/k8s/cuda-sam… nbody -gpu -benchmark”
I am facing the following error:

ERRO[2020-06-19T15:32:33.365111100-04:00] stream copy error: reading from a closed fifo
ERRO[2020-06-19T15:32:33.365119800-04:00] stream copy error: reading from a closed fifo
ERRO[2020-06-19T15:32:33.573504400-04:00] e488e831e71fb284dc30ff2ff80e7c9df4f378ec9aca236c6563f1dd505f7c79 cleanup: failed to delete container from containerd: no such container
ERRO[2020-06-19T15:32:33.573551400-04:00] Handler for POST /v1.40/containers/e488e831e71fb284dc30ff2ff80e7c9df4f378ec9aca236c6563f1dd505f7c79/start returned error: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

Is it because I install WSL2 before the driver? I am sure I have installed the driver.

Thank you!

Thank you for posting to this forum.
It is possible you have a setup issues with the Nvidia Container toolkit.

Do you see any failures in the WSL2 window running docker service?
Could you stop the docker service and perform the following steps:

Setup the stable and experimental repositories and the GPG key. The changes to the runtime to support WSL 2 are available in the experimental repository.


$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
        
Install the NVIDIA runtime packages (and their dependencies) after updating the package listing.


$ sudo apt-get update

$ sudo apt-get install -y nvidia-docker2

Also, it would be helpful if you could check if you have /dev/dxg folder in you WSL2 container (to confirm GPU was added there).

Last but not least, please make sure you installed 455.41+ version of the driver from the download link mentioned in the User Guide.

Following Same steps and getting same error.

My docker showed the same error. Here is my device and system info:
GTX 1080 eGPU by Thunderbolt,

Geforce Driver 455.41.
Windows build 20150.1000
Linux Kernel: 4.19.121,
Ubuntu 18.04-WSL2
docker: 19.03.6

Did you stop and start the Docker service as indicated in the user guide after installing nvidia-docker2?

This is

[quote=“wuguandejn, post:3, topic:128871”]
eGPU by Thunderbolt,
[/quote] configuration. Do you actually see the GPU in the container working?
Could you check if you get /dev/dxg folder there?

You should make sure no other distro is running, see this thread.

I installed CUDA outside docker and it works correctly. (while I still cannot get nvidia-docker work)

Hello,

If you are still hitting: “driver error: failed to process request”
This means that nvidia-container fall back on the Native Path instead of the WSL path and couldn’t communicate with the Native driver (which at that point is expected)

There are two main causes for that:

  • Either you have the non-experimental version of nvidia-docker installed (be sure to downloads nvidia-docker using the procedure described in the user guide). Only the experimental version of nvidia-docker can be used on WSL
  • Or the some components for WSL GPU supports are missing

. Since you mentioned you got it to work outside docker, it is likely the first one.

Also this might be very overkill here, there is a way to dump the debug log from Nvidia container:

  • Set the environment variable NVC_DEBUG_FILE=/some/file.txt
  • make sure the file exists and with the right permissions beforehand
  • Start dockerd manually (if you use sudo, don’t forget to use sudo -E for the env var)
  • Run the problematic container
  • The file should now have the logs. If there is no mention of WSL, dxcore or fallback you likely have the non-experimental version of nvidia-container. You will need to redo the procedure (maybe some problematic left over files from a previous install) to get the experimental nvidia-container support. If there are some errors related to dxcore you can post the logs here and we can take a look.

Thanks !

I redo all the step you suggested and have 455.41 version of driver on my windows host system. But I don’t have this folder /dev/dxg.

This is about Cuda. I think that is a step after the driver problem?

I don’t have this folder /dev/dxg.

This indicates the problem. You don’t have the GPU exposed into the container.
Did you install the WSL2 kernel with Windows Update as described in the user Guide?

Could you show the output of the following commands launched in your WSL2 container:
uname -a
ls /usr/lib/wsl/lib

$ uname -a
Linux NeilDLaptop 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

ls /usr/lib/wsl/lib
ls: cannot access ‘/usr/lib/wsl/lib’: No such file or directory

You do not have the right Linux kernel that support GPU.
Please follow the Windows Update section of the User Guide: https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-wsl2

Got you! Thank you for your help. I will update to the latest one.

I update the Windows version but I am facing another error"stat failed: /dev/dxg: no such file or directory"
while runing this "docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark "

uname -a
Linux NeilDLaptop 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

ls /usr/lib/wsl/lib
libcuda.so libcuda.so.1 libcuda.so.1.1 libd3d12.so libdirectml.so libdxcore.so

Digest: sha256:aaca690913e7c35073df08519f437fa32d4df59a89ef1e012360fbec46524ec8
Status: Downloaded newer image for nvcr.io/nvidia/k8s/cuda-sample:nbody
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: stat failed: /dev/dxg: no such file or directory\\\\n\\\"\"": unknown.
ERRO[0034] error waiting for container: context canceled

Linux NeilDLaptop 4.19.104-microsoft-standar

You still do not have have the right kernel installed. You need the kernel version 4.19.121+.
I hope you enabled the Optional updates in Windows Updates settings.

You can also try to run: wsl --update command on your Windows host.

Thank you kmorozov, by updating all the optional update, I make it work. I should pay extra attention to the instrution about the kernel version initially.

And one more question, I see somewhere saying there will be performance hurt using GPU on WSL2? If there is a hurt, I can install a Linux dual system alternatively. And Use WSL for prototyping.

Some performance issues are known in WSL2 due to the GPU paravirtualization used to deliver the GPU hardware inside the WSL2 container (please see the NVIDIA CUDA WSL blog for more details). However, the actual numbers would really depend on the workload. Generally for GPU bound applications the performance difference with native runs is expected to be significantly lower. There are some corner cases, of course, that need to be looked on case by case basis.

I’m having similar issues. Started over with a fresh Ubuntu 20.04 WSL 2 installation:
uname -r
4.19.128-microsoft-standard

Installed the Correct NVIDIA-SMI driver 460.20

±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.20 Driver Version: 460.20 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 208… WDDM | 00000000:53:00.0 On | N/A |
| 24% 55C P3 65W / 260W | 2506MiB / 11264MiB | 17% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

Was able to build the BlackScholes example without errors. When I try to rut it:

[./BlackScholes] - Starting…
CUDA error at …/…/common/inc/helper_cuda.h:777 code=35(cudaErrorInsufficientDriver) “cudaGetDeviceCount(&device_count)”

I haven’t tried with the container(s) yet. But before the Ubuntu WSL2 re-install I was getting a similar error so I’m assuming until I can fix this that will still be an issue.

After installing the docker, and the nvidia containers per the instructions. Trying to run:

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
results in:
stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""": unknown.

@seamans What it shows if you type ls -la /dev/dxg from bash?
What’s your Windows build? (run winver.exe)