CUDNN_STATUS_INTERNAL_ERROR when using convolution

ricardocuenk · January 28, 2018, 9:38pm

Hello!

After updating from Cuda 8.0 and cudnn 6.0 to Cuda 9.0 and cudnn 7.0 and updating the driver to current version 384.111 I can no longer execute scripts that use a convolution.

Executing scripts using RNN’s or fully connected models with PyTorch (v0.3) and TensorFlow (v1.5) works fine, however if the model contains a convolutional layer in either framework the script fails.

Using PyTorch the error is:
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR

With TensorFlow the error is:
2018-01-28 15:07:26.985395: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2018-01-28 15:07:26.985424: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-01-28 15:07:26.985432: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
Aborted (core dumped)

Furthermore if the PyTorch script with convolutional layers is executed as superuser using “sudo $(which python) script.py” it works as expected. But this workaround doesn’t work with TensorFlow because some links are broken, it probably does work in PyTorch because it is distributed with its own binaries for Cuda and cudnn.

This leads me to believe that the issue is with the driver’s installation, however after several attempts at reinstalling the driver and downgrading back to version 384.90, it still doesn’t work.
nvidia-smi is also working fine and displays the correct information.

I am out of ideas. Is there anything I could try to fix this?

Note: I am using python 3.5 with anaconda and I’ve tested this in several environments and tried reinstalling and building from source both frameworks.
Both Cuda and the drivers were installed using the runfiles.
System: Ubuntu 16.04
GPU: GTX 1080

miguel.sanchez.nan · February 11, 2018, 6:57am

I had the same problem and solved it by removing the .nv folder (some Nvidia cache) from my home directory.

Miguel

maflister · February 13, 2018, 1:38pm

Experiencing the same error with TensorFlow 1.4.1, CUDA 8.0, and cudnn 6.0. Cannot run convnets. Tried hte .nv solution with no success.

ricardocuenk · February 13, 2018, 5:56pm

Thanks!

Removing the .nv directory solved my problem too!
Both PyTorch 0.3 and Tensorflow 1.5 are working fine now even with convolution.

Btw I also changed to driver version 390.25 and I am using Cuda 9.0 and cudnn 7.

Topic		Replies	Views
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR cuDNN	3	8103	November 7, 2019
tensorflow/stream_executor/cuda/cuda_dnn.cc:329 CUDA Setup and Installation	2	3672	February 18, 2020
TensorFlow loads the wrong cuDNN version CUDA Setup and Installation cudnn	2	3912	October 16, 2023
Tensoflow error: Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED CUDA Developer Tools	0	1265	January 25, 2021
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	29	51593	October 12, 2021
Issues with Tensorflow on CUDA10 and RTX2080 CUDA Setup and Installation	3	4363	March 6, 2019
cuDNN Error cuDNN	1	932	April 25, 2019
Use gpu for tensorflow, crashes CUDA Setup and Installation tensorflow	14	6445	March 7, 2024
problem in Cudnn version change cuDNN	2	2082	November 8, 2019
"Failed to get convolution algorithm" problem cuDNN	4	8488	September 7, 2019

CUDNN_STATUS_INTERNAL_ERROR when using convolution

Related topics