CAN NOT TRAIN a simple CNN by using tensorflow-gpu for RTX5070 on Ubuntu 25.04 with driver 570-open

danilo.pau · May 21, 2025, 8:47am

Description

(tensorflowgpu) triumph@triumph-HP-Z6-G5-Workstation-Desktop-PC:~/gloria$ python test.py
2025-05-21 10:45:16.457139: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2025-05-21 10:45:16.471174: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1747817116.488001 8223 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747817116.492755 8223 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1747817116.505430 8223 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1747817116.505470 8223 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1747817116.505473 8223 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1747817116.505481 8223 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-05-21 10:45:16.509482: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
W0000 00:00:1747817118.377848 8223 gpu_device.cc:2430] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
W0000 00:00:1747817118.385232 8223 gpu_device.cc:2430] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1747817118.480241 8223 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 9500 MB memory: → device: 0, name: NVIDIA GeForce RTX 5070, pci bus id: 0000:5e:00.0, compute capability: 12.0
2025-05-21 10:45:18.600840: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] ‘cuModuleLoadData(&module, data)’ failed with ‘CUDA_ERROR_INVALID_PTX’

2025-05-21 10:45:18.600866: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] ‘cuModuleGetFunction(&function, module, kernel_name)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’

2025-05-21 10:45:18.600877: W tensorflow/core/framework/op_kernel.cc:1844] INTERNAL: ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
2025-05-21 10:45:18.600894: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
Traceback (most recent call last):
File “/home/triumph/gloria/test.py”, line 5, in
x = keras.layers.Conv2D(64, (3,3), padding=‘same’)(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/triumph/anaconda3/envs/tensorflowgpu/lib/python3.12/site-packages/keras/src/utils/traceback_utils.py”, line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File “/home/triumph/anaconda3/envs/tensorflowgpu/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py”, line 139, in convert_to_tensor
return tf.cast(x, dtype)
^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InternalError: {{function_node _wrapped__Cast_device/job:localhost/replica:0/task:0/device:GPU:0}} ‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast(stream), params, nullptr)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’ [Op:Cast] name:
(tensorflowgpu) triumph@triumph-HP-Z6-G5-Workstation-Desktop-PC:~/gloria$

(tensorflowgpu) triumph@triumph-HP-Z6-G5-Workstation-Desktop-PC:~/g$ conda list | grep tensorflow

packages in environment at /home/triumph/anaconda3/envs/tensorflowgpu:

tensorflow 2.19.0 pypi_0 pypi

Environment

TensorRT Version:
GPU Type: GEFORECE RTX 5070
Nvidia Driver Version: 570.133.07
CUDA Version: 12.8
CUDNN Version:
Operating System + Version: UBUNTU 25.04
Python Version (if applicable): 3.12.7
TensorFlow Version (if applicable): 2.19.0
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

(tensorflowgpu) triumph@triumph-HP-Z6-G5-Workstation-Desktop-PC:~/gloria$ more test.py
import tensorflow as tf
from tensorflow import keras

inputs = keras.layers.Input(shape=(224,224,3), name=“image_input”)
x = keras.layers.Conv2D(64, (3,3), padding=‘same’)(inputs)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.ReLU()(x)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(10, activation=‘softmax’)(x)

model = keras.models.Model(inputs=inputs, outputs=outputs)

print(model.summary())

Steps To Reproduce

python test.py

pedrobombeirinho · May 25, 2025, 1:10pm

I’m facing the same issue, same GPU but on wsl2 with Ubuntu 24.04

Topic		Replies	Views
This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: FMA, i TensorRT tensorrt , tensorflow , cudnn	1	6699	June 9, 2024
Not detected GPU by TensorFlow (GTX 1650 Ti) TensorRT cudnn	1	454	March 16, 2024
Trying to implement GPU for Tensorflow using CUDA & cuDNN TensorRT cudnn	1	887	May 20, 2024
"Could not find cuda drivers on your machine, GPU will not be used." TensorRT tensorrt , cuda , tensorflow , kernel , ubuntu , python , cudnn	1	858	June 25, 2024
TensorflowGPU problems RTX 2070 Super Deep Learning (Training & Inference) tensorflow	0	479	June 8, 2020
Tensorflow with RTX 2070 Super Frameworks tensorflow	14	9497	December 21, 2019
Tensorflow is not recognising the gpu TensorRT	7	2276	July 15, 2024
RuntimeError: CUDA error: no kernel image is available for execution on the device TensorRT cuda , cudnn	2	721	October 26, 2023
The tensorflow cudnn version is not compatible with tensorrt cudnn version Jetson TX2	3	1470	October 18, 2021
Subject: TensorFlow GPU Failure on RTX 5090 Laptop GPU in WSL2/Docker with Latest NVIDIA Drivers (576.xx) CUDA on Windows Subsystem for Linux gaming	2	598	June 17, 2025