[WSL2 + Docker + NVIDIA Container] CUDA failed to initialize in TensorFlow container
1. System Information
- OS: Windows 11 Pro (WSL2)
- WSL2 Distro: Ubuntu 24.04
- GPU: NVIDIA GeForce RTX 5090 (VRAM 32GB)
- NVIDIA Driver Version: 572.16
- CUDA Version: 12.8
- NVIDIA Container Toolkit Version: Latest
- Docker Version: 26.1.3
- TensorFlow Container:
nvcr.io/nvidia/tensorflow:25.01-tf2-py3
2. Steps to Reproduce & Issue Description
✅ 1) NVIDIA driver and GPU recognition check in WSL2
I first verified that the NVIDIA GPU is recognized correctly in WSL2 by running:
nvidia-smi
✅ Output (GPU recognized properly)
Thu Feb 20 17:07:18 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16 Driver Version: 572.16 CUDA Version: 12.8 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:01:00.0 On | N/A |
| 0% 46C P8 35W / 575W | 1558MiB / 32607MiB | 1% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
✅ 2) Checking if Docker containers recognize the GPU
I confirmed that Docker correctly recognizes the GPU by running a CUDA container.
docker run --rm --gpus all --runtime=nvidia nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
✅ Output (GPU detected correctly)
Thu Feb 20 08:07:33 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16 Driver Version: 572.16 CUDA Version: 12.8 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:01:00.0 On | N/A |
| 0% 46C P8 33W / 575W | 1548MiB / 32607MiB | 1% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
✅ This confirms that NVIDIA drivers and CUDA work correctly in the WSL2 + Docker environment.
❌ 3) Issue: TensorFlow container fails to initialize CUDA
I then launched the TensorFlow container and checked if CUDA was accessible.
bash
docker run --rm --gpus all -it nvcr.io/nvidia/tensorflow:25.01-tf2-py3 bash
✅ Successfully entered the TensorFlow container
== TensorFlow ==
NVIDIA Release 25.01-tf2 (build 134984172)
TensorFlow Version 2.17.0
Container image Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
…
❌ However, CUDA failed to initialize with the following error:
ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU functionality will not be available.
[[ Named symbol not found (error 500) ]]
Next, I checked if TensorFlow inside the container recognized the GPU by running:
python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”
❌ Output
TensorFlow does not detect any GPUs.
3. Debugging Attempts
1️⃣ Checked for libcuda.so
inside the container
find /usr -name “libcuda.so*”
✅ Output (Files are present)
/usr/local/cuda-12.8/compat/lib.real/libcuda.so
/usr/local/cuda-12.8/compat/lib.real/libcuda.so.1
/usr/local/cuda-12.8/compat/lib.real/libcuda.so.570.86.10
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.1
2️⃣ Checked environment variables
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:/usr/local/cuda-12.8/compat/lib.real:$LD_LIBRARY_PATH
Still, the same error persists.
3️⃣ Ran TensorFlow container with modified options
bash
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it nvcr.io/nvidia/tensorflow:25.01-tf2-py3 bash
🚨 Still, CUDA failed to initialize
error persists.
4. Summary & Questions
Findings so far:
- NVIDIA driver and CUDA work correctly in WSL2 and Docker.
- CUDA container (
nvidia/cuda:12.3.0-base-ubuntu22.04
) detects GPU properly. - However, TensorFlow container (
nvcr.io/nvidia/tensorflow:25.01-tf2-py3
) fails to initialize CUDA. - TensorFlow does not detect GPU (
tf.config.list_physical_devices('GPU')
returns[]
).
Questions:
- Is the combination of NVIDIA RTX 5090 + CUDA 12.8 + TensorFlow 2.17.0 expected to work in WSL2?
- What could be causing
CUDA failed to initialize
in the TensorFlow container while it works fine in the CUDA container? - Are there any additional configurations required to make TensorFlow detect the GPU inside the container?
✅ Looking for guidance on debugging and fixing this issue.
Any insights or suggestions would be greatly appreciated!
5. My nvidia-bug-report.log
nvidia-bug-report.log (259.9 KB)