tensorflow/stream_executor/cuda/cuda_dnn.cc:329

centOS7 RTX2060 6G tensorflow_gpu1.14 tensorflow_gpu1.15 CUDA10.0.130 cuDNN7.6.0&cuDNN7.6.4&cuDNN7.4.2
why??? what’s means the “cuda_dnn.cc:329”? cc:329??
2019-10-17 23:47:09.321289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-10-17 23:47:10.553186: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-10-17 23:47:10.558872: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
INFO:tensorflow:Error reported to Coordinator: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node FeatureExtractor/MobilenetV2/Conv/Conv2D (defined at /usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[Loss/classification_loss/_1031]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node FeatureExtractor/MobilenetV2/Conv/Conv2D (defined at /usr/local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

cuda_dnn.cc is the name of a source code file in the tensorflow (TF) build

the full path to that file within the TF source build tree is given by: tensorflow/stream_executor/cuda/cuda_dnn.cc

329 is the line number in that file where this error was indicated

These problems often have to do with an install problem (incorrect libraries or improperly installed) for CUDNN or CUBLAS or CUDA.

It’s impossible to identify specifically what is incorrectly installed from this error output. The usual advice is to do a proper install of each component, using the latest components, and verify the install of each component, before attempting to use TF.

If you have done that and TF still fails, it may be a TF issue. In that case you may wish to try the latest version of TF. If you still have trouble you can file a TF issue. TF is not a NVIDIA product. You may also wish to do more searching (perhaps on the specific error text you have posted here) to see if others have run into this issue.

CUDNN specific questions have their own sub-forum here:
https://devtalk.nvidia.com/default/board/305/cudnn/

I was having this same problem as detailed here:

https://github.com/tensorflow/tensorflow/issues/24496

and solved it with a solution posted at that cite.

Don’t know why it works, but add this code to the beginning before GPU init for Tensorflow 2:

gpus = tf.config.experimental.list_physical_devices(‘GPU’)
if gpus:
for gpu in gpus:
try:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices(‘GPU’)
print(len(gpus), “Physical GPUs,”, len(logical_gpus), “Logical GPUs”)
except RuntimeError as e:
print(e)

2 Likes