cudnn lstm is broken above driver 431.60, 'Unexpected Event status: 1 cuda'

onurcetinkol · December 9, 2019, 9:55am

Cuda: 10.1
cudnn: 7.6.4
os: windows 10
gpu: rtx 2060

If the model gets complicated like using more than 3 lstm layers, I’m getting ‘Unexpected Event status: 1 cuda’ randomly on both tensorflow(2.0) and pytorch(1.3). Latest drivers that I could find that don’t have any problems was 431.60 game ready and, 431.86 studio drivers.

E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-12-11 12:57:15.515019: E tensorflow/stream_executor/dnn.cc:596] CUDNN_STATUS_INTERNAL_ERROR
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1802): 'cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())'
2019-12-11 12:57:15.519102: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1

onurcetinkol · December 10, 2019, 9:11pm

edit: I thought cudnn 7.6.5 fixed the problem, but it didn’t.

wdulaney · January 30, 2020, 4:44am

I was throwing the same error with 441.66… downgraded to 431.86 and now no issues

Had same issues with TF 2.0 and 2.1

tensorflow 2.1 / 2.0
CUDA 10.1 / 10.0
cudnn 7.6.5 / 7.4.x
windows 10
gtx 1650

wdulaney · February 11, 2020, 3:44am

Any ideas on how to solve this problem with newer drivers? Is it really a driver issue?

https://www.mathworks.com/matlabcentral/answers/485733-cuda-crashes-when-training-lstm-on-geforce-rtx-2080-super

“Downgrading(!) the NVIDIA driver to the last stable studio driver (431.86) solved the issue.”

shashankmurthy1996 · February 26, 2020, 6:43pm

Either your output is very very long or batch size isn’t large enough. Try batch size of 32 and see if arrives at this fault faster…

I’ll explain what’s actually happening later, but this is the quickest solution. U need a dedicated GPU for this or else cudnnLSTM can’t work at its best.

U either have data leaning towards sparse or the recurrent sequence update is getting very very big and GPU fails to execute the malloc statement thus failing at submitting the forward direction of your RNN. There is more to this I will explain at a later time…

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.95)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)

Let me know if this worked for you. I ran into this a lot as well so this the basic conclusion I’ve come towards.

SunilJB · May 12, 2020, 11:09am

Could you please let us know if you are still facing this issue?

Thanks

rodrigo.ruiz7 · May 20, 2020, 7:14pm

Hi, I’m having the same issue I think.

I tried increasing the batch size on a toy example (though that wouldn’t be possible in a real scenario) and I also tried:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.95
session = InteractiveSession(config = config)

Since your code doesn’t work on TensorFlow 2 anymore.

I’m also unable to downgrade my drivers to 431.86 studio version, I get the error message “Your system requires a Standard driver package…”

What should I do?

SunilJB · May 21, 2020, 5:29am

Hi,
Can you try installing latest cuDNN version with following system settings “CUDA 10.2 and driver r440”?

Thanks

kein520 · June 12, 2020, 9:18am

I am having similar issue on Tensorflow 2.2.0. Followed the breadcrumbs here.

github.com/tensorflow/tensorflow

Error in cuda_dnn.cc(1921) with cudnnRNNBackwardData. Failed to call ThenRNNBackward

opened 05:12PM - 16 Jan 20 UTC

closed 11:02PM - 22 Jan 20 UTC

Kal213

type:support comp:keras TF 2.1

<em>Please make sure that this is a bug. As per our [GitHub Policy](https://gith…ub.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em> **System information** - Have I written custom code (as opposed to using a stock example script provided in TensorFlow): [No](https://www.tensorflow.org/tutorials/text/text_classification_rnn) - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 - Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: n/a - TensorFlow installed from (source or binary): binary - TensorFlow version (use command below): v2.1.0-rc1-58-g9837eceb39 - Python version: 3.6.8 - Bazel version (if compiling from source): n/a - GCC/Compiler version (if compiling from source): n/a - CUDA/cuDNN version: 10.1.243 / 7.6.0.64 - GPU model and memory: NVIDIA GeForce GTX 1050, 4.00GiB **Describe the current behavior** I'm attempting to learn about Recurrent Neural Networks following [this](https://www.tensorflow.org/tutorials/text/text_classification_rnn) guide from Tensorflow. For some reason whenever I try to run the network it fails. Interestingly network only ever seems to get past one epoch when verbose is set to 1, or excluded. In this case it will typically complete 1-3 epochs before failing. **Other info / logs** Potentially similar to [this issue](https://github.com/tensorflow/tensorflow/issues/35791) ``` 2020-01-16 10:26:44.374373: E tensorflow/stream_executor/dnn.cc:596] CUDNN_STATUS_INTERNAL_ERROR in tensorflow/stream_executor/cuda/cuda_dnn.cc(1802): 'cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())' 2020-01-16 10:26:44.375949: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at cudnn_rnn_ops.cc:1517 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 1850, 64, 64] 2020-01-16 10:26:44.376544: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 1850, 64, 64] [[{{node CudnnRNN}}]] 2020-01-16 10:26:44.377273: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: {{function_node __forward_cudnn_lstm_with_fallback_4560_specialized_for_sequential_bidirectional_backward_lstm_StatefulPartitionedCall_at___inference_distributed_function_5790}} {{function_node __forward_cudnn_lstm_with_fallback_4560_specialized_for_sequential_bidirectional_backward_lstm_StatefulPartitionedCall_at___inference_distributed_function_5790}} Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 1850, 64, 64] [[{{node CudnnRNN}}]] [[sequential/bidirectional/backward_lstm/StatefulPartitionedCall]] [[Reshape_11/_38]] 2020-01-16 10:26:44.379170: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: {{function_node __forward_cudnn_lstm_with_fallback_4560_specialized_for_sequential_bidirectional_backward_lstm_StatefulPartitionedCall_at___inference_distributed_function_5790}} {{function_node __forward_cudnn_lstm_with_fallback_4560_specialized_for_sequential_bidirectional_backward_lstm_StatefulPartitionedCall_at___inference_distributed_function_5790}} Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 1850, 64, 64] [[{{node CudnnRNN}}]] [[sequential/bidirectional/backward_lstm/StatefulPartitionedCall]] Traceback (most recent call last): File "C:\Users\Cal\Desktop\python\NN\RNN\RNN.py", line 50, in <module> validation_steps=30) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 819, in fit use_multiprocessing=use_multiprocessing) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 342, in fit total_epochs=epochs) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 128, in run_one_epoch batch_outs = execution_function(iterator) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 98, in execution_function distributed_function(input_fn)) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in __call__ result = self._call(*args, **kwds) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 599, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\eager\function.py", line 2363, in __call__ return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\eager\function.py", line 1611, in _filtered_call self.captured_inputs) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call ctx=ctx) File "C:\Users\Cal\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: [_Derived_] Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 1850, 64, 64] [[{{node CudnnRNN}}]] [[sequential/bidirectional/backward_lstm/StatefulPartitionedCall]] [[Reshape_11/_38]] [Op:__inference_distributed_function_5790] Function call stack: distributed_function -> distributed_function -> distributed_function ``` **Describe the expected behavior** The network trains without error. **Code to reproduce the issue** ```` from __future__ import absolute_import, division, print_function, unicode_literals import tensorflow_datasets as tfds import tensorflow as tf import matplotlib.pyplot as plt def plot_graphs(history, string): plt.plot(history.history[string]) plt.plot(history.history['val_'+string], '') plt.xlabel("Epochs") plt.ylabel(string) plt.legend([string, 'val_'+string]) plt.show() dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True) train_dataset, test_dataset = dataset['train'], dataset['test'] encoder = info.features['text'].encoder BUFFER_SIZE = 10000 BATCH_SIZE = 64 train_dataset = train_dataset.shuffle(BUFFER_SIZE) train_dataset = train_dataset.padded_batch(BATCH_SIZE, tf.compat.v1.data.get_output_shapes(train_dataset)) test_dataset = test_dataset.padded_batch(BATCH_SIZE, tf.compat.v1.data.get_output_shapes(test_dataset)) model = tf.keras.Sequential([ tf.keras.layers.Embedding(encoder.vocab_size, 64), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(loss='binary_crossentropy', optimizer=tf.keras.optimizers.Adam(1e-4), metrics=['accuracy']) history = model.fit(train_dataset, epochs=10, verbose=0, validation_data=test_dataset, validation_steps=30) test_loss, test_acc = model.evaluate(test_dataset) print('Test Loss: {}'.format(test_loss)) print('Test Accuracy: {}'.format(test_acc)) ```` **Already Tried:** -Updating tensorflow to 2.1, then to 2.1rc1. -Updating CUDA -Updating cudNN to 7.6.5.32 -Allowing GPU memory growth.

github.com/tensorflow/tensorflow

GPU-accelerated LSTMs/GRUs crash randomly with: [ InternalError: [_Derived_] Failed to call ThenRnnBackward with model config ]

opened 06:50AM - 12 Jun 20 UTC

closed 10:26PM - 12 Jun 20 UTC

leehanchung

type:bug comp:keras comp:gpu TF 2.2

<em>Please make sure that this is a bug. As per our [GitHub Policy](https://git…hub.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em> **System information** - Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Pro, Build 19041 - TensorFlow installed from (source or binary): pip install tensorflow - TensorFlow version (use command below): v2.2.0-rc4-8-g2b96f3662b 2.2.0 - Python version: 3.7.4 - CUDA/cuDNN version: CUDA 10.1, cuDNN 7.6.5 - GPU model and memory: NVidia Titan RTX, 24GB, RTX 2080 Ti, 11GB - nvidia driver version: 450.99 **Describe the current behavior** Both the Jupyter Notebook and extract Python script on the [Tensorflow Text Classification Tutorial ](https://www.tensorflow.org/tutorials/text/text_classification_rnn) crashes randomly when training locally on my GPU, with the following traceback: ``` tensorflow.python.framework.errors_impl.InternalError: [_Derived_] Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 2537, 64, 64] [[{{node CudnnRNN}}]] [[sequential/bidirectional/forward_lstm/StatefulPartitionedCall]] [[gradient_tape/sequential/embedding/embedding_lookup/Reshape/_38]] [Op:__inference_train_function_6128] Function call stack: train_function -> train_function -> train_function ``` I found two similar issues [#37942](https://github.com/tensorflow/tensorflow/issues/37942) and [#35950 ](https://github.com/tensorflow/tensorflow/issues/35950) Methods suggested in #37942 did not work and still crashes. **Describe the expected behavior** Example tutorial notebooks should run smoothly from top to bottom without random crashes. **Standalone code to reproduce the issue** [Github Gist here.](https://gist.github.com/leehanchung/8da991bf1264c19324920349171386bc) Code: ``` import tensorflow_datasets as tfds import tensorflow as tf import matplotlib.pyplot as plt def plot_graphs(history, metric): plt.plot(history.history[metric]) plt.plot(history.history['val_'+metric], '') plt.xlabel("Epochs") plt.ylabel(metric) plt.legend([metric, 'val_'+metric]) plt.show() dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True, as_supervised=True) train_dataset, test_dataset = dataset['train'], dataset['test'] encoder = info.features['text'].encoder print('Vocabulary size: {}'.format(encoder.vocab_size)) sample_string = 'Hello TensorFlow.' encoded_string = encoder.encode(sample_string) print('Encoded string is {}'.format(encoded_string)) original_string = encoder.decode(encoded_string) print('The original string: "{}"'.format(original_string)) assert original_string == sample_string for index in encoded_string: print('{} ----> {}'.format(index, encoder.decode([index]))) BUFFER_SIZE = 10000 BATCH_SIZE = 64 train_dataset = train_dataset.shuffle(BUFFER_SIZE) train_dataset = train_dataset.padded_batch(BATCH_SIZE) test_dataset = test_dataset.padded_batch(BATCH_SIZE) for example_batch, label_batch in train_dataset.take(20): print("Batch shape:", example_batch.shape) print("label shape:", label_batch.shape) model = tf.keras.Sequential([ tf.keras.layers.Embedding(encoder.vocab_size, 64), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dense(1) ]) model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=tf.keras.optimizers.Adam(1e-4), metrics=['accuracy']) history = model.fit(train_dataset, epochs=10, validation_data=test_dataset, validation_steps=30) test_loss, test_acc = model.evaluate(test_dataset) print('Test Loss: {}'.format(test_loss)) print('Test Accuracy: {}'.format(test_acc)) ``` **Other info / logs** ``` Epoch 1/10 2020-06-11 23:48:47.036226: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-06-11 23:48:47.417459: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 39/391 [=>............................] - ETA: 33s - loss: 0.6931 - accuracy: 0.50282020-06-11 23:48:52.108366: E tensorflow/stream_executor/dnn.cc:613] CUDNN_STATUS_INTERNAL_ERROR in tensorflow/stream_executor/cuda/cuda_dnn.cc(1986): 'cudnnRNNBackwardData( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, output_desc.handles(), output_data.opaque(), output_desc.handles(), output_backprop_data.opaque(), output_h_desc.handle(), output_h_backprop_data.opaque(), output_c_desc.handle(), output_c_backprop_data.opaque(), rnn_desc.params_handle(), params.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), input_desc.handles(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_backprop_data->opaque(), workspace.opaque(), workspace.size(), reserve_space_data->opaque(), reserve_space_data->size())' 2020-06-11 23:48:52.109818: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at cudnn_rnn_ops.cc:1922 : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 1615, 64, 64] Traceback (most recent call last): File ".\lesson1.py", line 59, in <module> validation_steps=30) File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\keras\engine\training.py", line 66, in _method_wrapper return method(self, *args, **kwargs) File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\keras\engine\training.py", line 848, in fit tmp_logs = train_function(iterator) File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\eager\def_function.py", line 580, in __call__ result = self._call(*args, **kwds) File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\eager\def_function.py", line 611, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\eager\function.py", line 2420, in __call__ return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\eager\function.py", line 1665, in _filtered_call self.captured_inputs) File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\eager\function.py", line 1746, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\eager\function.py", line 598, in call ctx=ctx) File "C:\Users\Han\.virtualenvs\tensorflow-in-practice-9XcfUv0Y\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InternalError: [_Derived_] Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 1615, 64, 64] [[{{node gradients/CudnnRNN_grad/CudnnRNNBackprop}}]] [[StatefulPartitionedCall_1]] [[gradient_tape/sequential/embedding/embedding_lookup/Reshape/_38]] [Op:__inference_train_function_6172] Function call stack: train_function -> train_function -> train_function ```

Currently I am on Windows 10 Pro, Build 19041, CUDA 10.1, cuDNN 7.6.5. My GeForce Experience shows driver 450.99. Hardware stack is AMD Threadripper 3960x, Titan RTX, and 2080 Ti.

Which GeForce drive should I downgrade to for cudnn lstm/rnn to work?

The earliest version from driver search is 441.20.

harriswilliam0 · August 13, 2020, 4:49pm

Has this been fixed??? Where can I download studio driver (431.86)? It doesn’t seem to be available on Official Drivers | NVIDIA

harriswilliam0 · August 15, 2020, 2:45pm

I was able to solve this error by installing the Geforce Game Ready Driver 431.60 from https://www.nvidia.com/Download/Find.aspx?lang=en-us. with Recommended/Beta: Recommended/Certified.

The most recent version 451.67 solved the error for a while however it would still randomly occur after a much longer amount of time.

rustequal · August 26, 2020, 12:33pm

I have “Windows 10 (build 2004)” and the same problem with LSTM layers and 452.06 version driver. The NVidia driver version 431.86 (Studio) or 431.36 (Game Ready) is not compatible with my version of Windows. So I’ll have to wait for NVidia to fix and release the new driver. Please, fix this problem!

salouri · September 8, 2020, 6:52am

Windows 10
TensorFlow 2.2.0
Cuda 10.1
cudnn 8.0.3.33
Nvidia 452.06
The problem still persists even after recent cudnn version 8.0.3 was deployed (23/8/2020)
Downgrading the Nvidia driver solved this particular issue and created other issues for me on other applications!
Therefore, I am also waiting for Nvidia to fix this issue. I am now using Google Colab as an alternative to my local machine whenever I need to use Bidirectional(LSTM) layers…

denis6 · December 3, 2020, 6:18pm

Windows 10
TensorFlow 2.3.0
Cuda 10.1
cudnn 8.0.5.39
Nvidia 457.30 (Studio)

The problem still persists. Can’t use 3090 to train LSTM containing networks :(

harriswilliam0 · February 4, 2021, 3:48pm

I think the new driver 461.40 fixed the error

Topic		Replies	Views
InternalError: Graph execution error: when running bidirectional model cuDNN	1	3922	April 7, 2022
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Frameworks (archived) tensorflow	1	1440	May 18, 2020
Cuda version compatibility for Windows 10? Internal Errors when running bidirectional layers in ternsorflow CUDA Setup and Installation cuda , system-setup	0	1200	April 1, 2022
Windows 10: R-Studio+R 3.5.1+Tensorflow+Python 3.6- Convolution Neural Network Error while fitting the model CUDA Setup and Installation	2	775	August 12, 2018
CuDNN — Status Not Intitialized (Keras/TensorFlow + Nvidia P100 + Linux) cuDNN	1	1253	October 23, 2018
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	29	51966	October 12, 2021
LSTMP doesn't work in cudnn 7.1.4 cuDNN	2	3138	February 18, 2021
Can't run LSTM based (TF-Keras) model on Jetson Nano - Function call stack: distributed_function -> distributed_function -> distributed_function Jetson Nano	3	1738	October 14, 2021
failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED Jetson TX2	10	1352	March 1, 2018
Crash on training (CUDA_ERROR_LAUNCH_FAILED) cuDNN	7	6864	October 12, 2021

cudnn lstm is broken above driver 431.60, 'Unexpected Event status: 1 cuda'

Related topics