RTX 4060, Win11, TF 2.19.0, CUDA 12.3.2 - GPU not detected despite nvidia-smi/deviceQuery PASS

TensorFlow GPU Detection Issue Troubleshooting Summary

Hello. I am seeking assistance with an issue where TensorFlow is unable to detect the GPU on my system, which has an NVIDIA GeForce RTX 4060. I have attempted various troubleshooting steps, but the problem persists.

1. System and Software Environment:

  • GPU: NVIDIA GeForce RTX 4060
  • Operating System: Windows 11
  • Python Version: 3.11.9
  • NVIDIA Driver Version: 566.36 (Lowest compatible driver version for Windows 11)
  • CUDA Toolkit Version: 12.3.2 (January 2024)
  • cuDNN Version: 8.9.7 (for CUDA 12.x)
  • TensorFlow Version: 2.19.0 (initial attempt), 2.12.0 (intermediate attempt), 2.16.1 (final attempt)
    • Currently, I have attempted with TensorFlow 2.19.0 (officially supporting CUDA 12.3, cuDNN 8.9.7) and 2.16.1 (compatible with CUDA 12.x, cuDNN 8.9.x).
  • Visual C++ Redistributable Packages: 2015-2022 (x64) repaired.

2. Problem Symptoms:

  • tf.config.list_physical_devices('GPU') consistently returns an empty list.
  • TensorFlow outputs the message “GPU not detected. Using CPU.” and operates solely on the CPU.
  • Even with TF_CPP_MIN_LOG_LEVEL set to 0 , no GPU-related loading failure messages are displayed.

3. Troubleshooting Steps Attempted and Results:

  • NVIDIA Driver Clean Reinstallation (using DDU):
    • Used DDU to completely remove existing drivers, then reinstalled the latest driver (576.88) from the NVIDIA official website using the “Perform a clean installation” option.
    • Result: Driver installed, but nvidia-smi reported CUDA Version: 12.9 , suggesting a version mismatch with TensorFlow 2.19.0 (CUDA 12.3).
    • Subsequently, performed another DDU clean reinstallation with 566.36 , the lowest driver version available for Windows 11.
    • Result: nvidia-smi correctly reported CUDA Version: 12.7 . (Theoretically compatible with CUDA 12.3).
  • CUDA Toolkit and cuDNN Version Matching:
    • Installed CUDA Toolkit 12.3.2 and cuDNN 8.9.7, precisely matching TensorFlow 2.19.0’s official requirements, and correctly copied cuDNN files to the CUDA Toolkit path.
    • An intermediate attempt involved downgrading to TensorFlow 2.12.0 with CUDA Toolkit 11.8 and cuDNN 8.6.0, but GPU detection still failed. (Later reverted to 12.3.2/8.9.7).
  • Environment Variable Path Setting:
    • The following four paths were placed at the very top of the system Path variable:
      • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin
      • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\libnvvp
      • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include
      • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\lib\x64
  • Virtual Environment Usage:
    • All TensorFlow installations and tests were performed within clean virtual environments created with python -m venv . (Using an elevated/administrator CMD).
  • TF_FORCE_GPU_ALLOW_GROWTH Environment Variable Setting:
    • Set the system environment variable TF_FORCE_GPU_ALLOW_GROWTH to true and rebooted the system.
  • NVIDIA Driver and CUDA Toolkit Self-Diagnosis:
    • nvidia-smi output (after installing driver 566.36):
Fri Jul  4 14:24:57 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 566.36           Driver Version: 566.36        CUDA Version: 12.7      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |           MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060       WDDM |   00000000:0B:00.0  On |                  N/A |
|  0%   49C    P5              N/A /  115W |   1188MiB /   8188MiB |     0%    Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI      PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|   0   N/A  N/A     2252   C+G    ...CBS_cw5n1h2txyewy\TextInputHost.exe      N/A      |
|  ... (truncated) ...
+-----------------------------------------------------------------------------------------+
* **Result:** Driver version is 566.36, and `CUDA Version` is correctly reported as 12.7.
  • deviceQuery.exe output (after installing CUDA Toolkit 12.3.2):
deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 4060"
  CUDA Driver Version / Runtime Version           12.7 / 12.3
  CUDA Capability Major/Minor version number:      8.9
  Total amount of global memory:                   8188 MBytes (8585216000 bytes)
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
  (24) Multiprocessors, (128) CUDA Cores/MP:       3072 CUDA Cores
  GPU Max Clock rate:                              2505 MHz (2.50 GHz)
  Memory Clock rate:                               8501 Mhz
  Memory Bus Width:                                128-bit
  L2 Cache Size:                                   25165824 bytes
  Maximum Texture Dimension Size (x,y,z)           1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers    1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers    2D=(32768, 32768), 2048 layers
  Total amount of constant memory:                 zu bytes
  Total amount of shared memory per block:         zu bytes
  Total number of registers available per block: 65536
  Warp size:                                       32
  Maximum number of threads per multiprocessor:    1536
  Maximum number of threads per block:             1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                            zu bytes
  Texture alignment:                               zu bytes
  Concurrent copy and kernel execution:            Yes with 1 copy engine(s)
  Run time limit on kernels:                       Yes
  Integrated GPU sharing Host Memory:              No
  Support host page-locked memory mapping:         Yes
  Alignment requirement for Surfaces:              Yes
  Device has ECC support:                          Disabled
  CUDA Device Driver Mode (TCC or WDDM):           WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):        Yes
  Device supports Compute Preemption:              Yes
  Supports Cooperative Kernel Launch:              Yes
  Supports MultiDevice Co-op Kernel Launch:        No
  Device PCI Domain ID / Bus ID / location ID:     0 / 11 / 0
  Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.7, CUDA Runtime Version = 12.3, NumDevs = 1, Device0 = NVIDIA GeForce RTX 4060
Result = PASS
* **Result:** Confirmed that the GPU communicates perfectly with the CUDA driver (12.7) and runtime (12.3), indicating normal operation at the system level.

4. Conclusion and Request for Assistance:

Both nvidia-smi and deviceQuery clearly show that the GPU and CUDA environment are functioning correctly at the system level. All version compatibilities (driver, CUDA Toolkit, cuDNN, TensorFlow) also align with official requirements. Despite this, TensorFlow fails to detect the GPU, which is highly unusual.

I would appreciate any insights into potential overlooked issues or known problems specific to this environment (Windows 11, RTX 4060, Python 3.11.9). Any additional methods for diagnosing why TensorFlow might be failing to load GPU-related DLLs would be greatly helpful.

Currently, I am unable to proceed with GPU-accelerated deep learning training due to this issue. Your assistance would be greatly appreciated.

Any solution? I have literally the exact same situation

Unfortunately, I haven’t found a solution to this issue yet. When I contacted NVIDIA, I received the following response:

Hello,
Thank you for contacting Nvidia Customer Care,
This is Prakash, assisting you in troubleshooting the issue that you are experiencing.
From the issue description, I understand that you are experiencing issues with the GPU not detecting in tensorflow,.
I apologize for any inconvenience this may have caused. Please be assured that I will do my best to help you.

  1. check if tensorflow sees your GPU
  2. check if your videocard can work with tensorflow
  3. find versions of CUDA Toolkit and cuDNN SDK, compatible with your tf version
    Build from source  |  TensorFlow
  4. install CUDA Toolkit
    CUDA Toolkit Archive | NVIDIA Developer
  5. check active CUDA version and switch it (if necessary)
  6. install cuDNN SDK
    cuDNN Archive | NVIDIA Developer
  7. pip uninstall tensorflow; pip install tensorflow-gpu
  8. check if tensorflow sees your GPU

If still same, I suggest you to post you query to our development team, please register at the developer’s web site:
http://developer.nvidia.com/page/home.html
The best solution for your query is to look at our Developer Zone.

Looking forward for your update,
Best regards NVIDIA Customer Care

However, I am still in the process of resolving it myself.
Currently, I am unfortunately only using the CPU. Once my current tasks are complete, I plan to explore various other solutions.

If I find a solution, I will be sure to share it with you.


“The process I tried is as follows, and I did not separately implement the methods suggested by NVIDIA, as our approach covered those steps.
Regarding the methods we tried, it’s the process you mentioned, starting from TensorFlow, where we continuously checked if TensorFlow recognized the GPU using tf.config.list_physical_devices(‘GPU’), focused on finding compatible versions of CUDA Toolkit and cuDNN SDK for various TensorFlow versions, installed different CUDA versions, attempted to switch active CUDA versions via environment variables, downloaded and installed matching cuDNN SDKs, and repeatedly reinstalled TensorFlow using pip uninstall tensorflow; pip install tensorflow-gpu whenever versions were changed.”