How can I use my RTX 5060ti to training model by using Tensorflow?

Hi everybody,

To be honest, I’m a newbie. I’m trying to install every package on Nvidia that is the requirement of Tensorflow, the requirements include CUDA and cuDNN.
My CUDA version is 13.0, cuDNN version is 9.13.1, and Tensorflow is 2.20.0!
I tried it many times, but it always shows me something like this:
```
python3 bitcoin.py
2025-11-03 14:10:21.838224: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2025-11-03 14:10:21.882288: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-11-03 14:10:22.887128: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2.20.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1762179023.723233 106736 gpu_device.cc:2431] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
[PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’)]
/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an input_shape/input_dim argument to a layer. When using Sequential models, prefer using an Input(shape) object as the first layer in the model instead.
super().init(**kwargs)
W0000 00:00:1762179024.002837 106736 gpu_device.cc:2431] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1762179024.156613 106736 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13270 MB memory: → device: 0, name: NVIDIA GeForce RTX 5060 Ti, pci bus id: 0000:01:00.0, compute capability: 12.0
2025-11-03 14:10:24.577763: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] ‘cuModuleLoadData(&module, data)’ failed with ‘CUDA_ERROR_INVALID_PTX’

2025-11-03 14:10:24.577810: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] ‘cuModuleGetFunction(&function, module, kernel_name)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’

2025-11-03 14:10:24.577846: W tensorflow/core/framework/op_kernel.cc:1842] INTERNAL: ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
2025-11-03 14:10:24.577881: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
Traceback (most recent call last):
File “/home/crush_dpl/cs50p/week4/bitcoin/bitcoin.py”, line 14, in
tf.keras.layers.Dropout(0.2),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/layers/regularization/dropout.py”, line 53, in init
self.seed_generator = backend.random.SeedGenerator(seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/random/seed_generator.py”, line 87, in init
self.state = self.backend.Variable(
^^^^^^^^^^^^^^^^^^^^^^
File “/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/backend/common/variables.py”, line 206, in init
self._initialize_with_initializer(initializer)
File “/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py”, line 52, in _initialize_with_initializer
self._initialize(lambda: initializer(self._shape, dtype=self._dtype))
File “/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py”, line 42, in _initialize
self._value = tf.Variable(
^^^^^^^^^^^^
File “/home/crush_dpl/.local/lib/python3.12/site-packages/tensorflow/python/util/traceback_utils.py”, line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File “/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py”, line 52, in
self._initialize(lambda: initializer(self._shape, dtype=self._dtype))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/random/seed_generator.py”, line 84, in seed_initializer
return self.backend.convert_to_tensor([seed, 0], dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/crush_dpl/.local/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py”, line 152, in convert_to_tensor
return tf.cast(x, dtype)
^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InternalError: {{function_node _wrapped__Cast_device/job:localhost/replica:0/task:0/device:GPU:0}} ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’ [Op:Cast] name:
```

This is my code, just a simple one:

import tensorflow as tf

print(tf.__version__)
print(tf.config.list_physical_devices('GPU'))
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([

  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')

])



model.compile(optimizer='adam',
  loss='sparse_categorical_crossentropy',
  metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

What should I do to fix this? Or should I downgrade my cuDNN and CUDA?
My thanks!

RTX 5060 has compute capability 12.0 - a very new architecture. Stable TensorFlow versions lack pre-compiled GPU kernels for this architecture, causing JIT compilation failures for certain operations (especially float32).

Error Mechanism:

  • TensorFlow looks for pre-compiled GPU kernels → Not found for compute capability 12.0

  • Attempts JIT compilation from PTX code → Fails with CUDA_ERROR_INVALID_PTX

  • Float32 operations fail while some int32 operations work (different kernels)

I too faced the same problem with my RTX 5060 GPU. Below are the steps to be followed so that you can solve the issue.
Step by Step Solution:
I have created a conda environment with Python version 3.11.4
conda create --name tf_gpu python=3.11.4 pip
conda activate tf_gpu

Install CUDA toolkit. (Provides CUDA 12.5.1 runtime libraries that match TensorFlow’s requirements. Using conda avoids system-wide installation conflicts.)

conda install nvidia/label/cuda-12.5.1::cuda-toolkit

Install Tensorflow Nightly. Nightly builds contain latest GPU kernel support, including pre-compiled kernels for compute capability 12.0, eliminating JIT compilation failures.

pip3 install tf-nightly

Configure the library path. Ensures TensorFlow can locate CUDA libraries installed by conda. This is persistent and automatic for the environment.

conda env config vars set LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

conda deactivate
conda activate tf_gpu

This approach provides a stable GPU-enabled TensorFlow installation specifically optimized for RTX 5060’s architecture.