Hello,
I am trying to make CUDA work under WSL on my system but no matter what I do, it just isn’t working.
I even completely wiped windows and did multiple fresh installs with the same result.
I tried with Windows 10 build 21390 and also Windows 11 build 22000.
I tried with WSL kernel 5.10.16 and also with 5.10.43.
I tried with driver version 471.21 and also with 471.41.
I follow the guide at 1. NVIDIA GPU Accelerated Computing on WSL 2 — CUDA on WSL 12.3 documentation but when I compile and try to run any CUDA application, it always fails with segmentation fault.
At this point I am out of ideas.
Any suggestions?
gdb output:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff680f219 in ?? () from /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5760b444dd211fa2/libcuda.so.1.1
I have done more testing but to no avail.
Just to reiterate:
I started with a clean Windows 11 install and followed the CUDA on WSL guide to the letter.
I compiled all the samples and each one I tried ended up with Segmentation Fault.
I can also reproduce this with a trivial program:
include <cuda.h>
int main()
{
cuInit(0);
return 0;
}
$ /usr/local/cuda/bin/cuda-gdb cuda_test
NVIDIA (R) CUDA Debugger
11.4 release
Portions Copyright (C) 2007-2021 NVIDIA Corporation
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-pc-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.
For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from cuda_test…
(cuda-gdb) r
Starting program: /home/cuda_test/cuda_test
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff65ee399 in ?? () from /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5d5c294bb8d17217/libcuda.so.1.1
Interestingly, I am able to run nvidia-smi just fine:
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
I also installed the docker part and tried the docker examples as instructed.
The nbody container just exits without printing anything.
The tensorflow python notebook fails at running the cells with “dead kernel” status.
Finally, the tensorflow resnet example fails with with Segmentation Fault:
All in all, it looks like CUDA under WSL is simply not working on my system, always ending in segfault during initialization.
This post mentions problems on AMD systems: Known Limitations with CUDA on WSL 2 but says it was fixed in later versions.
I happen to be running on a first generation Threadripper platform so maybe that is relevant? I am attaching an output from DxDiag to give more info on my hardware and software.
It would be nice if someone from Nvidia could take a look at this.
Thanks. DxDiag.txt (21.4 KB)
Progress update:
I removed (uninstalled in Windows, not physically removed) the three GPU’s that are not attached to the display and that seems to have solved the problem. As soon as I add one of those GPUs back, the problem reappears.
Interesting. Either it’s a limitation of the nvidia driver or something is having problems to initialize more than one GPU in WSL2.
WSL2 in Windows 11 comes with WSLg AKA gui+sound support. It’s enabled by default and it reserves memory for each GPU so a way to discard that as the cause for the SegmentationFault is disabling it in your .wslconfig file adding the line guiapplications=false
followed by running wsl.exe --shutdown to make the changes effective.
guiapplications=false has no effect on the cuda problem, the WSL GUI applications work regardless of which GPUs are enabled.
What is interesting is that if I execute any CUDA code inside WSL for the first time while only one GPU is enabled, it initializes properly and then I can enable the other GPUs up to all 4 and then they work too (most of the time) until I shutdown the WSL instance.
There is some randomness to this though. Sometimes I am able to do the first initialization with 2 GPUs but the more GPUs are enabled the less likely it is to succeed, not once was I able to start with all 4. Also adding additional GPUs after initialization works somewhat randomly with decreasing success as the number of enabled GPUs increases.
I guess that’s it for now… I have a workaround for my problem and i hope nvidia fixes the problem in the future.