CUDA initialization fails on Docker containers with WSL2

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
dGPU on Windows 11 with WSL2 (Ubuntu-22.04)
• DeepStream Version
DeepStream 7.0
• NVIDIA GPU Driver Version (valid for GPU only)
555.85
• Issue Type( questions, new requirements, bugs)
Questions/bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)

sudo docker pull nvcr.io/nvidia/deepstream:7.0-gc-triton-devel
  • Run container:
sudo docker run -it --privileged --rm --name=docker --net=host --gpus all -e DISPLAY=$DISPLAY -e CUDA_CACHE_DISABLE=0 --device /dev/snd -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/deepstream:7.0-gc-triton-devel

With previous NVIDIA driver versions (552) it used to work, but after updating to 555.85, once starting the container I receive the following start screen with an error:

===============================
   DeepStreamSDK 7.0.0
===============================

*** LICENSE AGREEMENT ***
By using this software you agree to fully comply with the terms and conditions
of the License Agreement. The License Agreement is located at
/opt/nvidia/deepstream/deepstream/LicenseAgreement.pdf. If you do not agree
to the terms and conditions of the License Agreement do not use the software.


=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.10 (build 72127154)
Triton Server Version 2.39.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

ERROR: The NVIDIA Driver is present, but CUDA failed to initialize.  GPU functionality will not be available.
   [[ Named symbol not found (error 500) ]]

When running nvidia-smi I get:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080 ...    On  |   00000000:01:00.0 Off |                  N/A |
| N/A   53C    P8              9W /  140W |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Before updating the driver, deepstream-app samples used to run fine whereas now I get this error when running deepstream-app -c /opt/nvidia/deepstream/deepstream-7.0/samples/configs/deepstream-app/source30_1080p_dec_infer-resnet_tiled_display_int8.txt:

GLib (gthread-posix.c): Unexpected error from C library during 'pthread_setspecific': Invalid argument.  Aborting.

(gst-plugin-scanner:121): GStreamer-WARNING **: 11:00:53.400: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/deepstream/libnvdsgst_udp.so': librivermax.so.1: cannot open shared object file: No such file or directory

(gst-plugin-scanner:121): GStreamer-WARNING **: 11:00:53.850: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstmpeg2enc.so': libmpeg2encpp-2.1.so.0: cannot open shared object file: No such file or directory

(gst-plugin-scanner:121): GStreamer-WARNING **: 11:00:53.872: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstchromaprint.so': libavcodec.so.58: cannot open shared object file: No such file or directory

(gst-plugin-scanner:121): GStreamer-WARNING **: 11:00:53.915: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstopenmpt.so': libmpg123.so.0: cannot open shared object file: No such file or directory

(gst-plugin-scanner:124): GStreamer-WARNING **: 11:00:54.139: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstmpeg2dec.so': libmpeg2.so.0: cannot open shared object file: No such file or directory

(gst-plugin-scanner:124): GStreamer-WARNING **: 11:00:54.163: Failed to load plugin '/usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstmpg123.so': libmpg123.so.0: cannot open shared object file: No such file or directory
** ERROR: <create_render_bin:132>: Failed to create 'sink_sub_bin_transform1'
** ERROR: <create_render_bin:184>: create_render_bin failed
** ERROR: <create_sink_bin:855>: create_sink_bin failed
** ERROR: <create_processing_instance:1238>: create_processing_instance failed
** ERROR: <create_pipeline:1876>: create_pipeline failed
** ERROR: <main:687>: Failed to create pipeline
Quitting
nvstreammux: Successfully handled EOS for source_id=0
nvstreammux: Successfully handled EOS for source_id=1
nvstreammux: Successfully handled EOS for source_id=2
nvstreammux: Successfully handled EOS for source_id=3
nvstreammux: Successfully handled EOS for source_id=4
nvstreammux: Successfully handled EOS for source_id=5
nvstreammux: Successfully handled EOS for source_id=6
nvstreammux: Successfully handled EOS for source_id=7
nvstreammux: Successfully handled EOS for source_id=8
nvstreammux: Successfully handled EOS for source_id=9
nvstreammux: Successfully handled EOS for source_id=10
nvstreammux: Successfully handled EOS for source_id=11
nvstreammux: Successfully handled EOS for source_id=12
nvstreammux: Successfully handled EOS for source_id=13
nvstreammux: Successfully handled EOS for source_id=14
nvstreammux: Successfully handled EOS for source_id=15
nvstreammux: Successfully handled EOS for source_id=16
nvstreammux: Successfully handled EOS for source_id=17
nvstreammux: Successfully handled EOS for source_id=18
nvstreammux: Successfully handled EOS for source_id=19
nvstreammux: Successfully handled EOS for source_id=20
nvstreammux: Successfully handled EOS for source_id=21
nvstreammux: Successfully handled EOS for source_id=22
nvstreammux: Successfully handled EOS for source_id=23
nvstreammux: Successfully handled EOS for source_id=24
nvstreammux: Successfully handled EOS for source_id=25
nvstreammux: Successfully handled EOS for source_id=26
nvstreammux: Successfully handled EOS for source_id=27
nvstreammux: Successfully handled EOS for source_id=28
nvstreammux: Successfully handled EOS for source_id=29
App run failed

Also to note, I’m not getting the CUDA initialization error when running the CUDA 12.2.0 container (nvcr.io/nvidia/cuda:12.2.0-devel-ubuntu22.04), so I’m not sure what is wrong, but updating the driver is the only thing I’ve done since the last time it worked.

1 Like

Could you try to restart the WSL2 first?

I’ve just restarted the WSL2 distro by running:

wsl --unregister Ubuntu-22.04
wsl --install -d Ubuntu-22.04
wsl --set-default Ubuntu-22.04

Then starting Docker and running the image in new container. I keep getting the same error.

OK. We currently only have good support for those introduced in the Guide NVIDIA driver (windows version) compatible for your GPU. When you upgrade the driver, it may cause the incompatibility issues. So it is best not to upgrade the driver.

I’ve just installed a previous driver. Since the approved version is Game Ready Driver 546.65 and I need Studio Driver, I installed 552.22, which is the last one before the one that jumps CUDA capability to 12.5 and it works perfectly.

I hope following driver versions keep this WSL2 feature into account, since updating the driver is usually needed to access new Omniverse Kit versions, which I rely on as well.

Thanks for the support!

1 Like

I was running locally on windows Triton Server 24.04 via docker. After updating driver to 555.85 Triton gives me same error when trying to start:

ERROR: The NVIDIA Driver is present, but CUDA failed to initialize.

I’ve tried to update container to 24.05, but it didn’t help.

Thank you - at least I can hope that downgrading driver version will help.

I also encountered related problems
Version information:

docker desktop v4.30.0
PS C:\WINDOWS\system32> wsl -v
WSL version: 2.2.4.0
Kernel version: 5.15.153.1-2
WSLg version: 1.0.61
MSRDC version: 1.2.5326
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.26091.1-240325-1447.ge-release
Windows version: 10.0.22635.3720
(fairclip) root@b9600c22467c:/# nvidia-smi
Sun Jun  9 13:43:05 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01              Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   66C    P0             21W /  140W |    1468MiB /   8188MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Using docker pull nvidia/cuda:12.5.0-devel-ubuntu22.04 , the torch2.3.0 environment has the following error:

>>> import torch
>>> torch.cuda.is_available()
/root/miniconda3/envs/fairclip/lib/python3.9/site-packages/torch/cuda/__init__.py:82: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:112.)
  return torch._C._cuda_getDeviceCount() > 0
False
>>> exit()

Please follow our Guide instructions to install the dependent drivers and related software versions.
NVIDIA driver (windows version) compatible for your GPU

dGPU model Platform and OS Compatibility