TensorFlow CUDNN_STATUS_EXECUSION_FAILED

jkuschan · May 5, 2021, 9:39am

I have a new Laptop with a Quadro T2000 graphics card, with windows 10. When I tried to launch my TensorFlow pipeline, I always receive the error CUDNN_STATUS_EXECUSION_FAILED. I installed the same configuration on different computers with different GPUs but never had this error. I’m working with TensorFlow 2.4, CUDA 11.0, and cudnn 8.0.4 for CUDA 11. I also tried to update CUDA and cudnn to 11.3, but the same results. Trying to run the code on CPU only was successful. Maybe someone had a similar problem and could help me.

`2021-04-29 09:40:42.865425: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-29 09:40:44.523 | INFO | hare.experiment_utilities:create_logdir:14 - Logging in test4
Split is: (0.73, 0.11, 0.16)
2021-04-29 09:40:44.524 | DEBUG | hare.datasets.oca:fetch_oca_variant:151 - Reading from c:\users*\documents\hare\hare\datasets....\data\oca.h5
2021-04-29 09:40:45.731 | DEBUG | hare.datasets.oca:fetch_oca_variant:177 - Keeping columns: imu0(acc|gyr)|imu1(acc|gyr)|imu2_(acc|gyr)|imu3_(acc|gyr)|imu4_(acc|gyr)|imu5_(acc|gyr)|imu6_(acc|gyr)|imu7_(acc|gyr)|imu8_(acc|gyr)|imu9_(acc|gyr)|imu10_(acc|gyr)|imu11_(acc|gyr)|label
2021-04-29 09:40:47.289 | DEBUG | hare.datasets.oca:_fetch_oca_variant:217 - Calculated input shape: (24, 24, 1)
2021-04-29 09:40:47.489 | DEBUG | hare.datasets.oca:_fetch_oca_variant:233 - Lengths: 25400, 3200, 6800
2021-04-29 09:40:47.491 | DEBUG | hare.datasets.oca:_fetch_oca_variant:236 - Dataset metadata: OCAMeta(input_shape=(24, 24, 1), n_features=24, label_map={0: ‘Null’, 1: ‘Mount Cover Panel’, 2: ‘Take Cover Panel Off’, 3: ‘Take Screwdriver’, 4: ‘Place Screwdriver Down’, 5: ‘Screw Unscrew Cover Panel’, 6: ‘Pick Up Screw’}, class_weights={0: 0.6713894830228953, 1: 1.9555214429626786, 2: 2.5954701348093314, 3: 2.174370061660349, 4: 2.670533257632004, 5: 0.2716948134350906, 6: 10.1079776506761}, mean=array([ 4.14772883, -3.01140435, 5.00519487, 0.79388079, -2.0454244 ,
0.67598148, 8.3959571 , 2.2111051 , 3.45433119, 2.25248151,
0.07432106, 0.53279361, -2.69410177, 4.22425256, -2.7555687 ,
-1.03421906, 0.06175712, -1.20503147, -8.62881747, 1.87319881,
3.0321183 , -1.93063112, 1.04115928, 0.84837715]))
Getting a lstm
2021-04-29 09:40:47.500207: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-29 09:40:47.500906: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-04-29 09:40:48.479080: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA Quadro T2000 with Max-Q Design computeCapability: 7.5
coreClock: 1.395GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 149.04GiB/s
2021-04-29 09:40:48.479238: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-29 09:40:48.490980: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-29 09:40:48.491286: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-29 09:40:48.494604: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-29 09:40:48.499897: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-29 09:40:48.507602: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-29 09:40:48.511182: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-29 09:40:48.512141: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-29 09:40:48.512354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-04-29 09:40:48.518978: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-29 09:40:48.520744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA Quadro T2000 with Max-Q Design computeCapability: 7.5
coreClock: 1.395GHz coreCount: 16 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 149.04GiB/s
2021-04-29 09:40:48.527435: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-29 09:40:48.527747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-29 09:40:48.528274: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-29 09:40:48.528328: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-29 09:40:48.528468: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-29 09:40:48.528847: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-29 09:40:48.536321: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-29 09:40:48.536664: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-29 09:40:48.537400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-04-29 09:40:48.910311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-29 09:40:48.910463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-04-29 09:40:48.911616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-04-29 09:40:48.918609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2903 MB memory) → physical GPU (device: 0, name: NVIDIA Quadro T2000 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 7.5)
2021-04-29 09:40:48.919704: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-29 09:40:49.771 | DEBUG | hare.models.callbacks:create_callbacks:18 - test4\run_0, 200
2021-04-29 09:40:49.819184: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/1000
2021-04-29 09:40:53.338705: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-29 09:40:53.720560: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-29 09:40:53.728816: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-29 09:40:54.578844: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2021-04-29 09:40:54.620358: I tensorflow/core/platform/windows/subprocess.cc:308] SubProcess ended with return code: 0

2021-04-29 09:40:54.809019: E tensorflow/stream_executor/dnn.cc:616] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1859): ‘cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())’
2021-04-29 09:40:54.809524: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at cudnn_rnn_ops.cc:1521 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1536, 128, 1, 8, 128, 128]
Traceback (most recent call last):
File “oca_split_variants.py”, line 92, in
exp()
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\typer\main.py”, line 214, in call
return get_command(self)(*args, **kwargs)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\click\core.py”, line 829, in call
return self.main(*args, **kwargs)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\click\core.py”, line 782, in main
rv = self.invoke(ctx)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\click\core.py”, line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\click\core.py”, line 610, in invoke
return callback(*args, **kwargs)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\typer\main.py”, line 497, in wrapper
return callback(**use_params) # type: ignore
File “oca_split_variants.py”, line 75, in experiment
history = seshmodel.fit(
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\keras\engine\training.py”, line 1100, in fit
tmp_logs = self.train_function(iterator)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\def_function.py”, line 828, in call
result = self._call(*args, **kwds)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\def_function.py”, line 888, in _call
return self._stateless_fn(*args, **kwds)
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\function.py”, line 2942, in call
return graph_function._call_flat(
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\function.py”, line 1918, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\function.py”, line 555, in call
outputs = execute.execute(
File “C:\Users*\Anaconda3\envs\tf_cuda\lib\site-packages\tensorflow\python\eager\execute.py”, line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 1536, 128, 1, 8, 128, 128]
[[{{node CudnnRNN}}]]
[[sequential/lstm/PartitionedCall]] [Op:__inference_train_function_6539]

Function call stack:
train_function → train_function → train_function`

AakankshaS · May 21, 2021, 11:19am

Hi @jkuschan ,
Apologies for delayed response.
This looks like the tensorflow issue, hence we recommend you to raise it in the respective forum.
Thanks!

Topic		Replies	Views
CuDNN error while fitting CNN cuDNN	2	3529	May 17, 2020
Fail to initialize CUDNN when running tensorflow: CUDNN_STATUS_INTERNAL_ERROR Jetson AGX Xavier tensorflow , cudnn	7	2811	October 18, 2021
kernel version 440.31.0 does not match DSO version 440.33.1 — cannot find working devices in this configuration Linux	4	20952	December 12, 2019
cudnn lstm is broken above driver 431.60, 'Unexpected Event status: 1 cuda' cuDNN	14	8730	February 4, 2021
tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR cuDNN	4	7121	December 24, 2020
Tensorflow 2.1 with CUDA10.2 warnings .. Frameworks tensorflow	15	17751	July 3, 2020
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	29	51586	October 12, 2021
"Failed to get convolution algorithm" problem cuDNN	4	8488	September 7, 2019
CUDNN_STATUS_INTERNAL_ERROR in gtx 1650 CUDA Developer Tools	0	892	October 25, 2020
Multiple executive warnings after switching tensorflow from 2.16.1 CPU to v60dp tensorflow==2.15.0+nv24.03 GPU version Jetson Orin Nano cudnn	8	1941	May 21, 2024

TensorFlow CUDNN_STATUS_EXECUSION_FAILED

Related topics