cudnn lstm is broken above driver 431.60, 'Unexpected Event status: 1 cuda'

Cuda: 10.1
cudnn: 7.6.4
os: windows 10
gpu: rtx 2060

If the model gets complicated like using more than 3 lstm layers, I’m getting ‘Unexpected Event status: 1 cuda’ randomly on both tensorflow(2.0) and pytorch(1.3). Latest drivers that I could find that don’t have any problems was 431.60 game ready and, 431.86 studio drivers.

E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
2019-12-11 12:57:15.515019: E tensorflow/stream_executor/dnn.cc:596] CUDNN_STATUS_INTERNAL_ERROR
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1802): 'cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())'
2019-12-11 12:57:15.519102: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1

edit: I thought cudnn 7.6.5 fixed the problem, but it didn’t.

I was throwing the same error with 441.66… downgraded to 431.86 and now no issues

Had same issues with TF 2.0 and 2.1

tensorflow 2.1 / 2.0
CUDA 10.1 / 10.0
cudnn 7.6.5 / 7.4.x
windows 10
gtx 1650

Any ideas on how to solve this problem with newer drivers? Is it really a driver issue?

https://www.mathworks.com/matlabcentral/answers/485733-cuda-crashes-when-training-lstm-on-geforce-rtx-2080-super

“Downgrading(!) the NVIDIA driver to the last stable studio driver (431.86) solved the issue.”

Either your output is very very long or batch size isn’t large enough. Try batch size of 32 and see if arrives at this fault faster…

I’ll explain what’s actually happening later, but this is the quickest solution. U need a dedicated GPU for this or else cudnnLSTM can’t work at its best.

U either have data leaning towards sparse or the recurrent sequence update is getting very very big and GPU fails to execute the malloc statement thus failing at submitting the forward direction of your RNN. There is more to this I will explain at a later time…

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.95)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)

Let me know if this worked for you. I ran into this a lot as well so this the basic conclusion I’ve come towards.

Could you please let us know if you are still facing this issue?

Thanks

Hi, I’m having the same issue I think.

I tried increasing the batch size on a toy example (though that wouldn’t be possible in a real scenario) and I also tried:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.95
session = InteractiveSession(config = config)

Since your code doesn’t work on TensorFlow 2 anymore.

I’m also unable to downgrade my drivers to 431.86 studio version, I get the error message “Your system requires a Standard driver package…”

What should I do?

Hi,
Can you try installing latest cuDNN version with following system settings “CUDA 10.2 and driver r440”?

Thanks

I am having similar issue on Tensorflow 2.2.0. Followed the breadcrumbs here.

Currently I am on Windows 10 Pro, Build 19041, CUDA 10.1, cuDNN 7.6.5. My GeForce Experience shows driver 450.99. Hardware stack is AMD Threadripper 3960x, Titan RTX, and 2080 Ti.

Which GeForce drive should I downgrade to for cudnn lstm/rnn to work?

The earliest version from driver search is 441.20.

Has this been fixed??? Where can I download studio driver (431.86)? It doesn’t seem to be available on Official Drivers | NVIDIA

I was able to solve this error by installing the Geforce Game Ready Driver 431.60 from https://www.nvidia.com/Download/Find.aspx?lang=en-us. with Recommended/Beta: Recommended/Certified.

The most recent version 451.67 solved the error for a while however it would still randomly occur after a much longer amount of time.

I have “Windows 10 (build 2004)” and the same problem with LSTM layers and 452.06 version driver. The NVidia driver version 431.86 (Studio) or 431.36 (Game Ready) is not compatible with my version of Windows. So I’ll have to wait for NVidia to fix and release the new driver. Please, fix this problem!

1 Like

Windows 10
TensorFlow 2.2.0
Cuda 10.1
cudnn 8.0.3.33
Nvidia 452.06
The problem still persists even after recent cudnn version 8.0.3 was deployed (23/8/2020)
Downgrading the Nvidia driver solved this particular issue and created other issues for me on other applications!
Therefore, I am also waiting for Nvidia to fix this issue. I am now using Google Colab as an alternative to my local machine whenever I need to use Bidirectional(LSTM) layers…

Windows 10
TensorFlow 2.3.0
Cuda 10.1
cudnn 8.0.5.39
Nvidia 457.30 (Studio)

The problem still persists. Can’t use 3090 to train LSTM containing networks :(

2 Likes

I think the new driver 461.40 fixed the error

2 Likes