"Failed to get convolution algorithm" problem

edwardliang11 · March 13, 2019, 3:35am

Hi,

I was trying to run a DL program using cuDNN 7.4.2, tensorflow 1.13.1 and CUDA 10.0. It used to work, but today I got the following error.

2019-03-13 11:33:41.129307: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-13 11:33:42.438705: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-03-13 11:33:42.446073: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
  File "FCN.py", line 122, in <module>
    verbose=1, validation_data=(x_test, Y_test), callbacks = [reduce_lr])
  File "/home/el/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/engine/training.py", line 1039, in fit
    validation_steps=validation_steps)
  File "/home/el/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/engine/training_arrays.py", line 200, in fit_loop
    outs = f(ins_batch)
  File "/home/el/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/el/anaconda3/envs/tensorflow/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/el/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/el/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node conv2d_1/convolution}}]]
         [[{{node metrics/acc/Mean}}]]

How can I resolve this issue? Thank you!

andrewli2003 · April 16, 2019, 8:51pm

This fixed my same problem.
https://devtalk.nvidia.com/default/topic/1021858/jetson-tx2/tensorflow-memory-error/
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)

gaoxu515 · April 18, 2019, 2:43am

I have tried these codes，it worked.Thanks～
But the i found that the pool layer’s relu function of the convolution algorithm didn’t work in gpu，how to solve it？

michael.gschwind · September 6, 2019, 11:41pm

The suggested solution did not work for me :(

Many others are reporting the same problem.

X-Ref:

https://devtalk.nvidia.com/default/topic/1062664/cudnn/problem-with-1d-convolutions-under-keras-/

https://devtalk.nvidia.com/default/topic/1062190/cudnn/cudnn-failed-to-initialize/

https://devtalk.nvidia.com/default/topic/1043867/cudnn/failed-to-get-convolution-algorithm-this-is-probably-because-cudnn-failed-to-initialize/

https://devtalk.nvidia.com/default/topic/1055928/cudnn/error-failed-to-get-convolution-algorithm-this-is-probably-because-cudnn-failed-to-initialize-so-/

https://devtalk.nvidia.com/default/topic/1051380/cudnn/could-not-create-cudnn-handle-cudnn_status_internal_error/

michael.gschwind · September 7, 2019, 7:11pm

So…
after some more experimentation. a reboot and the following sequence made the 1D convolution work.

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

The thing to highlight is that this required a full reboot, and was the first sequence executed.

This did not work previously when I tried without a reboot. Even shutting down and restarting jupyter notebook did not help.

Here’s what I have installed for reference, with a GTX 1660 Ti on an ASUS ROG Strix laptop under Ubuntu 18.04.

$ sudo dpkg -i libcudnn7_7.4.1.5-1+cuda10.0_amd64.deb libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64.deb libcudnn7-doc_7.4.1.5-1+cuda10.0_amd64.deb
$ pip3 install --upgrade tensorflow-gpu==1.13.1
$ nvidia-smi
Sat Sep 7 12:02:49 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166… Off | 00000000:01:00.0 On | N/A |
| N/A 52C P0 33W / N/A | 5011MiB / 5944MiB | 17% Default |
±------------------------------±---------------------±---------------------+

==============================================================================

[1]
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
[…]
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

model = Sequential()
model.add(layers.Conv1D(32,5,activation=‘relu’,
input_shape=(None, float_data.shape[-1])))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32,5,activation=‘relu’))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32,5,activation=‘relu’))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss=‘mae’)

history = model.fit_generator(train_gen,
steps_per_epoch=500,
epochs = 20,
validation_data=val_gen,
validation_steps = val_steps)

2019-09-07 12:01:14.981980: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-07 12:01:14.982624: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2fc87c0 executing computations on platform CUDA. Devices:
2019-09-07 12:01:14.982643: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5
2019-09-07 12:01:15.012160: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2019-09-07 12:01:15.013817: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2fef210 executing computations on platform Host. Devices:
2019-09-07 12:01:15.013898: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-09-07 12:01:15.014242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
totalMemory: 5.80GiB freeMemory: 5.35GiB

[1]
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
[…]
model = Sequential()
model.add(layers.Embedding(max_features, 128, input_length=max_len))
model.add(layers.Conv1D(32,7,activation=‘relu’))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32,7,activation=‘relu’))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))

model.summary()

model.compile(optimizer=RMSprop(lr=1e-4),
loss=‘binary_crossentropy’,
metrics=[‘acc’])
history = model.fit(x_train, y_train,
epochs = 10,
batch_size=128,
validation_split=0.2)

2019-09-07 12:13:01.916734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
totalMemory: 5.80GiB freeMemory: 838.06MiB
2019-09-07 12:13:01.916783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-09-07 12:13:01.917183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-07 12:13:01.917192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-09-07 12:13:01.917199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-09-07 12:13:01.917248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 613 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-09-07 12:13:27.897105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-09-07 12:13:27.897140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-07 12:13:27.897146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-09-07 12:13:27.897150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-09-07 12:13:27.897199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 613 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-09-07 12:13:28.416804: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-09-07 12:13:29.495347: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 650.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-07 12:13:29.495943: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 650.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-07 12:13:29.503194: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

[etc etc.]

Topic		Replies	Views
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	29	51538	October 12, 2021
Problem with 1D convolutions under keras: cuDNN	3	2894	October 12, 2021
cuDNN failed to initialize cuDNN	2	1580	September 16, 2019
CUDNN_STATUS_INTERNAL_ERROR in gtx 1650 CUDA Developer Tools	0	888	October 25, 2020
Tensorflow_gpu R - could not create cudnn handle cuDNN	1	1237	November 12, 2019
Fail to initialize CUDNN when running tensorflow: CUDNN_STATUS_INTERNAL_ERROR Jetson AGX Xavier tensorflow , cudnn	7	2800	October 18, 2021
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR cuDNN	3	8093	November 7, 2019
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Frameworks tensorflow	1	1363	May 18, 2020
cuDNN tensorflow/core/kernels/gpu_utils.cc:85 Detected cudnn out-of-bounds write in convolution buffer CUDA Developer Tools	1	1451	July 27, 2020
kernel version 440.31.0 does not match DSO version 440.33.1 — cannot find working devices in this configuration Linux	4	20851	December 12, 2019

"Failed to get convolution algorithm" problem

Related topics