[WSL2 + Docker + TensorFlow] CUDA failed to initialize in TensorFlow container (RTX 5090, CUDA 12.8)

[WSL2 + Docker + NVIDIA Container] CUDA failed to initialize in TensorFlow container

1. System Information

  • OS: Windows 11 Pro (WSL2)
  • WSL2 Distro: Ubuntu 24.04
  • GPU: NVIDIA GeForce RTX 5090 (VRAM 32GB)
  • NVIDIA Driver Version: 572.16
  • CUDA Version: 12.8
  • NVIDIA Container Toolkit Version: Latest
  • Docker Version: 26.1.3
  • TensorFlow Container: nvcr.io/nvidia/tensorflow:25.01-tf2-py3

2. Steps to Reproduce & Issue Description

✅ 1) NVIDIA driver and GPU recognition check in WSL2

I first verified that the NVIDIA GPU is recognized correctly in WSL2 by running:
nvidia-smi

Output (GPU recognized properly)
Thu Feb 20 17:07:18 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16 Driver Version: 572.16 CUDA Version: 12.8 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:01:00.0 On | N/A |
| 0% 46C P8 35W / 575W | 1558MiB / 32607MiB | 1% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

✅ 2) Checking if Docker containers recognize the GPU

I confirmed that Docker correctly recognizes the GPU by running a CUDA container.
docker run --rm --gpus all --runtime=nvidia nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
Output (GPU detected correctly)
Thu Feb 20 08:07:33 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16 Driver Version: 572.16 CUDA Version: 12.8 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:01:00.0 On | N/A |
| 0% 46C P8 33W / 575W | 1548MiB / 32607MiB | 1% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

This confirms that NVIDIA drivers and CUDA work correctly in the WSL2 + Docker environment.

❌ 3) Issue: TensorFlow container fails to initialize CUDA

I then launched the TensorFlow container and checked if CUDA was accessible.

bash
docker run --rm --gpus all -it nvcr.io/nvidia/tensorflow:25.01-tf2-py3 bash

Successfully entered the TensorFlow container

== TensorFlow ==

NVIDIA Release 25.01-tf2 (build 134984172)
TensorFlow Version 2.17.0
Container image Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

However, CUDA failed to initialize with the following error:
ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU functionality will not be available.
[[ Named symbol not found (error 500) ]]

Next, I checked if TensorFlow inside the container recognized the GPU by running:
python3 -c “import tensorflow as tf; print(tf.config.list_physical_devices(‘GPU’))”
Output

TensorFlow does not detect any GPUs.

3. Debugging Attempts

1️⃣ Checked for libcuda.so inside the container
find /usr -name “libcuda.so*”

Output (Files are present)
/usr/local/cuda-12.8/compat/lib.real/libcuda.so
/usr/local/cuda-12.8/compat/lib.real/libcuda.so.1
/usr/local/cuda-12.8/compat/lib.real/libcuda.so.570.86.10
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.1

2️⃣ Checked environment variables
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:/usr/local/cuda-12.8/compat/lib.real:$LD_LIBRARY_PATH

Still, the same error persists.

3️⃣ Ran TensorFlow container with modified options

bash
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it nvcr.io/nvidia/tensorflow:25.01-tf2-py3 bash

🚨 Still, CUDA failed to initialize error persists.

4. Summary & Questions

Findings so far:

  • NVIDIA driver and CUDA work correctly in WSL2 and Docker.
  • CUDA container (nvidia/cuda:12.3.0-base-ubuntu22.04) detects GPU properly.
  • However, TensorFlow container (nvcr.io/nvidia/tensorflow:25.01-tf2-py3) fails to initialize CUDA.
  • TensorFlow does not detect GPU (tf.config.list_physical_devices('GPU') returns []).

Questions:

  1. Is the combination of NVIDIA RTX 5090 + CUDA 12.8 + TensorFlow 2.17.0 expected to work in WSL2?
  2. What could be causing CUDA failed to initialize in the TensorFlow container while it works fine in the CUDA container?
  3. Are there any additional configurations required to make TensorFlow detect the GPU inside the container?

Looking for guidance on debugging and fixing this issue.
Any insights or suggestions would be greatly appreciated!

5. My nvidia-bug-report.log

nvidia-bug-report.log (259.9 KB)

  1. Attaching nvidia-bug-report.log.gz would definitely increase chances of people looking into this.
  2. Where have you obtained driver version 572.16 from? I couldn’t find any downloads for it…
  3. have you verified that it works in a simpler setup? for example directly on Linux without Win+WSL? Also, would it be feasible to recreate the environment of the container directly in Linux to omit the docker layer? (it’s an honest question, I have no idea TBH).
1 Like

I searched through the nvidia-smi command and found that the driver version of my rtx5090 is 572.16. And I added the nvidia bug report log to my questionnaire

I searched through the nvidia-smi command and found that the driver version of my rtx5090 is 572.16. And I added the nvidia bug report log to my questionnaire

maybe windows driver? WSL…

1 Like