Segmentation fault in cuInit()

Hello,
I am trying to make CUDA work under WSL on my system but no matter what I do, it just isn’t working.
I even completely wiped windows and did multiple fresh installs with the same result.
I tried with Windows 10 build 21390 and also Windows 11 build 22000.
I tried with WSL kernel 5.10.16 and also with 5.10.43.
I tried with driver version 471.21 and also with 471.41.
I follow the guide at 1. NVIDIA GPU Accelerated Computing on WSL 2 — CUDA on WSL 12.3 documentation but when I compile and try to run any CUDA application, it always fails with segmentation fault.
At this point I am out of ideas.
Any suggestions?

gdb output:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff680f219 in ?? () from /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5760b444dd211fa2/libcuda.so.1.1

I have done more testing but to no avail.
Just to reiterate:
I started with a clean Windows 11 install and followed the CUDA on WSL guide to the letter.
I compiled all the samples and each one I tried ended up with Segmentation Fault.
I can also reproduce this with a trivial program:

include <cuda.h>
int main()
{
cuInit(0);
return 0;
}

compiled with:

$ g++ -g -I/usr/local/cuda/include -L/usr/lib/wsl/lib cuda_test.cpp -lcuda -o cuda_test

when debugged with cuda-dbg:

$ /usr/local/cuda/bin/cuda-gdb cuda_test
NVIDIA (R) CUDA Debugger
11.4 release
Portions Copyright (C) 2007-2021 NVIDIA Corporation
GNU gdb (GDB) 10.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-pc-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from cuda_test…
(cuda-gdb) r
Starting program: /home/cuda_test/cuda_test
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff65ee399 in ?? () from /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5d5c294bb8d17217/libcuda.so.1.1

Interestingly, I am able to run nvidia-smi just fine:

$ /usr/lib/wsl/lib/nvidia-smi
Tue Jul 27 11:40:35 2021
±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.01 Driver Version: 471.41 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:08:00.0 Off | N/A |
| 0% 36C P8 11W / 275W | 136MiB / 11264MiB | ERR! Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:09:00.0 Off | N/A |
| 0% 32C P8 13W / 275W | 136MiB / 11264MiB | ERR! Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA GeForce … Off | 00000000:43:00.0 Off | N/A |
| 0% 35C P8 12W / 275W | 136MiB / 11264MiB | ERR! Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 NVIDIA GeForce … Off | 00000000:44:00.0 On | N/A |
| 0% 30C P8 20W / 275W | 1730MiB / 11264MiB | ERR! Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

I also installed the docker part and tried the docker examples as instructed.
The nbody container just exits without printing anything.
The tensorflow python notebook fails at running the cells with “dead kernel” status.
Finally, the tensorflow resnet example fails with with Segmentation Fault:

root@6f9d314bee10:/workspace/nvidia-examples# python cnn/resnet.py
2021-07-27 09:20:10.439623: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.2
2021-07-27 09:20:11.057514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.7
2021-07-27 09:20:11.058215: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.7
PY 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0]
TF 2.1.0
Script arguments:
–image_width=224
–image_height=224
–distort_color=False
–momentum=0.9
–loss_scale=128.0
–image_format=channels_last
–data_dir=None
–data_idx_dir=None
–batch_size=256
–num_iter=300
–iter_unit=batch
–log_dir=None
–export_dir=None
–tensorboard_dir=None
–display_every=10
–precision=fp16
–dali_mode=None
–use_xla=False
–predict=False
2021-07-27 09:20:11.679096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
[6f9d314bee10:00337] *** Process received signal ***
[6f9d314bee10:00337] Signal: Segmentation fault (11)
[6f9d314bee10:00337] Signal code: Address not mapped (1)
[6f9d314bee10:00337] Failing at address: 0x3a60
[6f9d314bee10:00337] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f975b5f5f20]
[6f9d314bee10:00337] [ 1] /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5d5c294bb8d17217/libcuda.so.1.1(+0x3c2399)[0x7f970ff64399]
[6f9d314bee10:00337] [ 2] /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5d5c294bb8d17217/libcuda.so.1.1(+0x3a4140)[0x7f970ff46140]
[6f9d314bee10:00337] [ 3] /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5d5c294bb8d17217/libcuda.so.1.1(+0x2d7bfb)[0x7f970fe79bfb]
[6f9d314bee10:00337] [ 4] /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5d5c294bb8d17217/libcuda.so.1.1(+0x2d8d1b)[0x7f970fe7ad1b]
[6f9d314bee10:00337] [ 5] /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5d5c294bb8d17217/libcuda.so.1.1(+0x4237ed)[0x7f970ffc57ed]
[6f9d314bee10:00337] [ 6] /usr/lib/wsl/drivers/nv_dispi.inf_amd64_5d5c294bb8d17217/libcuda.so.1.1(+0x2d0450)[0x7f970fe72450]
[6f9d314bee10:00337] [ 7] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0xb979220)[0x7f966b109220]
[6f9d314bee10:00337] [ 8] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0xb86537e)[0x7f966aff537e]
[6f9d314bee10:00337] [ 9] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(_ZN15stream_executor3gpu9GpuDriver4InitEv+0x8d)[0x7f966aff558d]
[6f9d314bee10:00337] [10] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.2(_ZNK15stream_executor3gpu12CudaPlatform18VisibleDeviceCountEv+0x12)[0x7f965ef66652]
[6f9d314bee10:00337] [11] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow19XlaGpuDeviceFactory19ListPhysicalDevicesEPSt6vectorISsSaISsEE+0x11d)[0x7f966324948d]
[6f9d314bee10:00337] [12] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/…/libtensorflow_framework.so.2(_ZN10tensorflow13DeviceFactory22ListAllPhysicalDevicesEPSt6vectorISsSaISsEE+0xcd)[0x7f965e9f4f5d]
[6f9d314bee10:00337] [13] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0x2613690)[0x7f9661da3690]
[6f9d314bee10:00337] [14] python[0x50a8af]
[6f9d314bee10:00337] [15] python(_PyEval_EvalFrameDefault+0x449)[0x50c5b9]
[6f9d314bee10:00337] [16] python[0x509d48]
[6f9d314bee10:00337] [17] python[0x50aa7d]
[6f9d314bee10:00337] [18] python(_PyEval_EvalFrameDefault+0x449)[0x50c5b9]
[6f9d314bee10:00337] [19] python[0x508245]
[6f9d314bee10:00337] [20] python[0x50a080]
[6f9d314bee10:00337] [21] python[0x50aa7d]
[6f9d314bee10:00337] [22] python(_PyEval_EvalFrameDefault+0x449)[0x50c5b9]
[6f9d314bee10:00337] [23] python[0x508245]
[6f9d314bee10:00337] [24] python[0x50a080]
[6f9d314bee10:00337] [25] python[0x50aa7d]
[6f9d314bee10:00337] [26] python(_PyEval_EvalFrameDefault+0x449)[0x50c5b9]
[6f9d314bee10:00337] [27] python[0x509d48]
[6f9d314bee10:00337] [28] python[0x50aa7d]
[6f9d314bee10:00337] [29] python(_PyEval_EvalFrameDefault+0x449)[0x50c5b9]
[6f9d314bee10:00337] *** End of error message ***
Segmentation fault

All in all, it looks like CUDA under WSL is simply not working on my system, always ending in segfault during initialization.
This post mentions problems on AMD systems: Known Limitations with CUDA on WSL 2 but says it was fixed in later versions.
I happen to be running on a first generation Threadripper platform so maybe that is relevant? I am attaching an output from DxDiag to give more info on my hardware and software.

It would be nice if someone from Nvidia could take a look at this.
Thanks.
DxDiag.txt (21.4 KB)

I can’t reproduce it in a i5 9400F (6 cores) + 16Gb RAM.

I think it’s worth trying to reduce the number of “processors” and ram assigned to the WSL2 VM.
Manage Linux Distributions | Microsoft Docs

Thanks. I changed it to 4GB and 2 cores, same result.

1 Like

Looking at your DxDiag.txt, so are you running 4 GTX 1080Ti GPUs in SLI?

4 x 1080 Ti but no SLI.

Progress update:
I removed (uninstalled in Windows, not physically removed) the three GPU’s that are not attached to the display and that seems to have solved the problem. As soon as I add one of those GPUs back, the problem reappears.

@ nvidia: Should I file a bug report?

1 Like

Interesting. Either it’s a limitation of the nvidia driver or something is having problems to initialize more than one GPU in WSL2.

WSL2 in Windows 11 comes with WSLg AKA gui+sound support. It’s enabled by default and it reserves memory for each GPU so a way to discard that as the cause for the SegmentationFault is disabling it in your .wslconfig file adding the line
guiapplications=false
followed by running wsl.exe --shutdown to make the changes effective.

I played with it a little more.

guiapplications=false has no effect on the cuda problem, the WSL GUI applications work regardless of which GPUs are enabled.
What is interesting is that if I execute any CUDA code inside WSL for the first time while only one GPU is enabled, it initializes properly and then I can enable the other GPUs up to all 4 and then they work too (most of the time) until I shutdown the WSL instance.

There is some randomness to this though. Sometimes I am able to do the first initialization with 2 GPUs but the more GPUs are enabled the less likely it is to succeed, not once was I able to start with all 4. Also adding additional GPUs after initialization works somewhat randomly with decreasing success as the number of enabled GPUs increases.

I guess that’s it for now… I have a workaround for my problem and i hope nvidia fixes the problem in the future.

1 Like