ERRO[2020-06-19T15:32:33.365111100-04:00] stream copy error: reading from a closed fifo
ERRO[2020-06-19T15:32:33.365119800-04:00] stream copy error: reading from a closed fifo
ERRO[2020-06-19T15:32:33.573504400-04:00] e488e831e71fb284dc30ff2ff80e7c9df4f378ec9aca236c6563f1dd505f7c79 cleanup: failed to delete container from containerd: no such container
ERRO[2020-06-19T15:32:33.573551400-04:00] Handler for POST /v1.40/containers/e488e831e71fb284dc30ff2ff80e7c9df4f378ec9aca236c6563f1dd505f7c79/start returned error: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown
Is it because I install WSL2 before the driver? I am sure I have installed the driver.
Thank you for posting to this forum.
It is possible you have a setup issues with the Nvidia Container toolkit.
Do you see any failures in the WSL2 window running docker service?
Could you stop the docker service and perform the following steps:
Setup the stable and experimental repositories and the GPG key. The changes to the runtime to support WSL 2 are available in the experimental repository.
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
Install the NVIDIA runtime packages (and their dependencies) after updating the package listing.
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
Also, it would be helpful if you could check if you have /dev/dxg folder in you WSL2 container (to confirm GPU was added there).
Last but not least, please make sure you installed 455.41+ version of the driver from the download link mentioned in the User Guide.
[quote=“wuguandejn, post:3, topic:128871”]
eGPU by Thunderbolt,
[/quote] configuration. Do you actually see the GPU in the container working?
Could you check if you get /dev/dxg folder there?
If you are still hitting: “driver error: failed to process request”
This means that nvidia-container fall back on the Native Path instead of the WSL path and couldn’t communicate with the Native driver (which at that point is expected)
There are two main causes for that:
Either you have the non-experimental version of nvidia-docker installed (be sure to downloads nvidia-docker using the procedure described in the user guide). Only the experimental version of nvidia-docker can be used on WSL
Or the some components for WSL GPU supports are missing
. Since you mentioned you got it to work outside docker, it is likely the first one.
Also this might be very overkill here, there is a way to dump the debug log from Nvidia container:
Set the environment variable NVC_DEBUG_FILE=/some/file.txt
make sure the file exists and with the right permissions beforehand
Start dockerd manually (if you use sudo, don’t forget to use sudo -E for the env var)
Run the problematic container
The file should now have the logs. If there is no mention of WSL, dxcore or fallback you likely have the non-experimental version of nvidia-container. You will need to redo the procedure (maybe some problematic left over files from a previous install) to get the experimental nvidia-container support. If there are some errors related to dxcore you can post the logs here and we can take a look.
This indicates the problem. You don’t have the GPU exposed into the container.
Did you install the WSL2 kernel with Windows Update as described in the user Guide?
Could you show the output of the following commands launched in your WSL2 container:
uname -a
ls /usr/lib/wsl/lib
I update the Windows version but I am facing another error"stat failed: /dev/dxg: no such file or directory"
while runing this "docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark "
uname -a
Linux NeilDLaptop 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
ls /usr/lib/wsl/lib
libcuda.so libcuda.so.1 libcuda.so.1.1 libd3d12.so libdirectml.so libdxcore.so
Digest: sha256:aaca690913e7c35073df08519f437fa32d4df59a89ef1e012360fbec46524ec8
Status: Downloaded newer image for nvcr.io/nvidia/k8s/cuda-sample:nbody
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: stat failed: /dev/dxg: no such file or directory\\\\n\\\"\"": unknown.
ERRO[0034] error waiting for container: context canceled
You still do not have have the right kernel installed. You need the kernel version 4.19.121+.
I hope you enabled the Optional updates in Windows Updates settings.
You can also try to run: wsl --update command on your Windows host.
Thank you kmorozov, by updating all the optional update, I make it work. I should pay extra attention to the instrution about the kernel version initially.
And one more question, I see somewhere saying there will be performance hurt using GPU on WSL2? If there is a hurt, I can install a Linux dual system alternatively. And Use WSL for prototyping.
Some performance issues are known in WSL2 due to the GPU paravirtualization used to deliver the GPU hardware inside the WSL2 container (please see the NVIDIA CUDA WSL blog for more details). However, the actual numbers would really depend on the workload. Generally for GPU bound applications the performance difference with native runs is expected to be significantly lower. There are some corner cases, of course, that need to be looked on case by case basis.
Was able to build the BlackScholes example without errors. When I try to rut it:
[./BlackScholes] - Starting…
CUDA error at …/…/common/inc/helper_cuda.h:777 code=35(cudaErrorInsufficientDriver) “cudaGetDeviceCount(&device_count)”
I haven’t tried with the container(s) yet. But before the Ubuntu WSL2 re-install I was getting a similar error so I’m assuming until I can fix this that will still be an issue.
After installing the docker, and the nvidia containers per the instructions. Trying to run:
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark results in:
stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\""": unknown.