TensorFlow CUDNN_STATUS_EXECUSION_FAILED

I have a new Laptop with a Quadro T2000 graphics card, with windows 10. When I tried to launch my TensorFlow pipeline, I always receive the error CUDNN_STATUS_EXECUSION_FAILED. I installed the same configuration on different computers with different GPUs but never had this error. I’m working with TensorFlow 2.4, CUDA 11.0, and cudnn 8.0.4 for CUDA 11. I also tried to update CUDA and cudnn to 11.3, but the same results. Trying to run the code on CPU only was successful. Maybe someone had a similar problem and could help me.

`2021-04-29 09:40:42.865425: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-29 09:40:44.523 | INFO | hare.experiment_utilities:create_logdir:14 - Logging in test4
Split is: (0.73, 0.11, 0.16)
2021-04-29 09:40:44.524 | DEBUG | hare.datasets.oca:fetch_oca_variant:151 - Reading from c:\users*\documents\hare\hare\datasets....\data\oca.h5
2021-04-29 09:40:45.731 | DEBUG | hare.datasets.oca:fetch_oca_variant:177 - Keeping columns: imu0(acc|gyr)|imu1
(acc|gyr)|imu2_(acc|gyr)|imu3_(acc|gyr)|imu4_(acc|gyr)|imu5_(acc|gyr)|imu6_(acc|gyr)|imu7_(acc|gyr)|imu8_(acc|gyr)|imu9_(acc|gyr)|imu10_(acc|gyr)|imu11_(acc|gyr)|label
2021-04-29 09:40:47.289 | DEBUG | hare.datasets.oca:_fetch_oca_variant:217 - Calculated input shape: (24, 24, 1)
2021-04-29 09:40:47.489 | DEBUG | hare.datasets.oca:_fetch_oca_variant:233 - Lengths: 25400, 3200, 6800
2021-04-29 09:40:47.491 | DEBUG | hare.datasets.oca:_fetch_oca_variant:236 - Dataset metadata: OCAMeta(input_shape=(24, 24, 1), n_features=24, label_map={0: ‘Null’, 1: ‘Mount Cover Panel’, 2: ‘Take Cover Panel Off’, 3: ‘Take Screwdriver’, 4: ‘Place Screwdriver Down’, 5: ‘Screw Unscrew Cover Panel’, 6: ‘Pick Up Screw’}, class_weights={0: 0.6713894830228953, 1: 1.9555214429626786, 2: 2.5954701348093314, 3: 2.174370061660349, 4: 2.670533257632004, 5: 0.2716948134350906, 6: 10.1079776506761}, mean=array([ 4.14772883, -3.01140435, 5.00519487, 0.79388079, -2.0454244 ,
0.67598148, 8.3959571 , 2.2111051 , 3.45433119, 2.25248151,
0.07432106, 0.53279361, -2.69410177, 4.22425256, -2.7555687 ,
-1.03421906, 0.06175712, -1.20503147, -8.62881747, 1.87319881,
3.0321183 , -1.93063112, 1.04115928, 0.84837715]))
Getting a lstm
2021-04-29 09:40:47.500207: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-29 09:40:47.500906: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-04-29 09:40:48.479080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA Quadro T2000 with Max-Q Design computeCapability: 7.5
coreClock: 1.395GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 149.04GiB/s
2021-04-29 09:40:48.479238: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-29 09:40:48.490980: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-29 09:40:48.491286: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-29 09:40:48.494604: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-29 09:40:48.499897: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-29 09:40:48.507602: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-29 09:40:48.511182: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-29 09:40:48.512141: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-29 09:40:48.512354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-04-29 09:40:48.518978: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-29 09:40:48.520744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA Quadro T2000 with Max-Q Design computeCapability: 7.5
coreClock: 1.395GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 149.04GiB/s
2021-04-29 09:40:48.527435: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-29 09:40:48.527747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-29 09:40:48.528274: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-29 09:40:48.528328: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-29 09:40:48.528468: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-29 09:40:48.528847: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-29 09:40:48.536321: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-29 09:40:48.536664: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-29 09:40:48.537400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-04-29 09:40:48.910311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-29 09:40:48.910463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-04-29 09:40:48.911616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-04-29 09:40:48.918609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2903 MB memory) → physical GPU (device: 0, name: NVIDIA Quadro T2000 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-04-29 09:40:48.919704: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-29 09:40:49.771 | DEBUG | hare.models.callbacks:create_callbacks:18 - test4\run_0, 200
2021-04-29 09:40:49.819184: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/1000
2021-04-29 09:40:53.338705: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-29 09:40:53.720560: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-29 09:40:53.728816: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-29 09:40:54.578844: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2021-04-29 09:40:54.620358: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2021-04-29 09:40:54.809019: E tensorflow/stream_executor/dnn.cc:616] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1859): ‘cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())’
2021-04-29 09:40:54.809524: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1521 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1536, 128, 1, 8, 128, 128]
Traceback (most recent call last):
File “oca_split_variants.py”, line 92, in
exp()
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\typer\main.py”, line 214, in call
return get_command(self)(*args, **kwargs)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\click\core.py”, line 829, in call
return self.main(*args, **kwargs)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\click\core.py”, line 782, in main
rv = self.invoke(ctx)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\click\core.py”, line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\click\core.py”, line 610, in invoke
return callback(*args, **kwargs)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\typer\main.py”, line 497, in wrapper
return callback(**use_params) # type: ignore
File “oca_split_variants.py”, line 75, in experiment
history = seshmodel.fit(
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\keras\engine\training.py”, line 1100, in fit
tmp_logs = self.train_function(iterator)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\def_function.py”, line 828, in call
result = self._call(*args, **kwds)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\def_function.py”, line 888, in _call
return self._stateless_fn(*args, **kwds)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\function.py”, line 2942, in call
return graph_function._call_flat(
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\function.py”, line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\function.py”, line 555, in call
outputs = execute.execute(
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\execute.py”, line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1536, 128, 1, 8, 128, 128]
[[{{node CudnnRNN}}]]
[[sequential/lstm/PartitionedCall]] [Op:__inference_train_function_6539]

Function call stack:
train_function → train_function → train_function`

Hi @jkuschan ,
Apologies for delayed response.
This looks like the tensorflow issue, hence we recommend you to raise it in the respective forum.
Thanks!