Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

axel.roebel · May 15, 2020, 12:26pm

Hello

I am operating a few GPU servers in our team where we have 5 servers with 4 GPUs
and a few workstations with individual GPUs. The types of GPUs in use are the following

   GeForce GTX 1050 Ti with Max-Q Design   
   GeForce GTX 1050 Ti
   GeForce RTX 2080 Ti
  GeForce GTX 1080 Ti
  TITAN Xp COLLECTORS EDITION

We use tensorflow 1.12, 1.13 1.14 2.0 and 2.1 all installed via anaconda.
Until I intalled TF2.1 everything went fine. With TF2.1 most of our training software does not work throwing the CUDNN internal error directly when the CUDA libraries are loaded. I underline that exactly the same code works fine on T2.0! I have a minimal example that fails
here below.

More precisely the installation that I tried is tensorflow-gpu=2.1 with cudatoolkit=10.1 from anaconda main repos, but I tried as well installing tensorflow-gpu via pip with exactly the same result. I can reproduce this under linux-ubuntu 18.04 and debian 9.12 with the cards

   GeForce GTX 1050 Ti with Max-Q Design   
   GeForce GTX 1050 Ti
   GeForce RTX 2080 Ti

but on the two other cards available in our team

  GeForce GTX 1080 Ti
  TITAN Xp COLLECTORS EDITION

the very same code runs fine on the installations containg the very same TF21/cuda versions

Following the discussion here Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR · Issue #24496 · tensorflow/tensorflow · GitHub
I discovered a work around that consists in allowing memory growth (see in the code below).

Interestingly one of the people in the list of the bug report managed to work around the problem by means of installing the last driver

github.com/tensorflow/tensorflow

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

opened 06:29AM - 21 Dec 18 UTC

closed 07:05PM - 05 Aug 21 UTC

michaelmyc

stat:awaiting response type:bug stale comp:gpu TF 2.0

<em>Please make sure that this is a bug. As per our [GitHub Policy](https://gith…ub.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em> **System information** - Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes and No (described below) - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro - Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: - TensorFlow installed from (source or binary): tf-nightly-gpu (Dec 19, r1.13) - TensorFlow version (use command below): 1.13.0-dev20181219 - Python version: 3.7.1 - Bazel version (if compiling from source): - GCC/Compiler version (if compiling from source): - CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1 - GPU model and memory: RTX 2070 8GB **Describe the current behavior** I'm running the CNN model on MNIST. When I'm running with the GPU, I am encountering ```2018-12-20 20:09:13.644176: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR``` I did some digging and realized that it is a memory issue (which shouldn't be the case as I have 32GB of RAM and 64GB of swap. I ran htop when running the model and I have 20+GB free, which is more than enough to fit the 8GB vRAM mappings. Using the `gpu_options.allow_growth = True` gets the model to work properly, and setting `os.environ['CUDA_VISIBLE_DEVICES'] = '-1'` also works. This means that I AM facing a memory issue, but I don't see how. Also, using `gpu_options.allow_growth = True` does not fix the same issue when trying to run tensorflow/models/official/mnist/ model, which should have a similar behavior with my code. **Code to reproduce the issue** ``` import os import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data import math import time # Killing optional CPU driver warnings os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # os.environ['CUDA_VISIBLE_DEVICES'] = '-1' tf.logging.set_verbosity(tf.logging.ERROR) class Model: def __init__(self, image, label): """ A Model class contains a computational graph that classifies images to predictions. Each of its methods builds part of the graph on Model initialization. Do not modify the constructor, as doing so would break the autograder. You may, however, add class variables to use in your graph-building. e.g. learning rate, image: the input image to the computational graph as a tensor label: the correct label of an image as a tensor prediction: the output prediction of the computational graph, produced by self.forward_pass() optimize: the model's optimizing tensor produced by self.optimizer() loss: the model's loss produced by computing self.loss_function() accuracy: the model's prediction accuracy """ self.image = image self.label = label # TO-DO: Add any class variables you want to use. self.prediction = self.forward_pass() self.loss = self.loss_function() self.optimize = self.optimizer() self.accuracy = self.accuracy_function() def forward_pass(self): """ Predicts a label given an image using convolution layers :return: the prediction as a tensor """ filter_1 = tf.Variable(tf.truncated_normal([3, 3, 1, 8], stddev=0.1)) conv_1 = tf.nn.conv2d(self.image, filter_1, [1, 1, 1, 1], "SAME") reshaped = tf.reshape(conv_1, shape=[50, -1]) L1 = reshaped.shape[1].value L2 = 500 W1 = tf.Variable(tf.random_normal([L1, L2], mean=0, stddev=0.01)) b1 = tf.Variable(tf.random_normal([L2], mean=0, stddev=0.01)) relu_1 = tf.nn.relu(tf.matmul(reshaped, W1) + b1) W2 = tf.Variable(tf.random_normal([L2, 10], mean=0, stddev=0.01)) b2 = tf.Variable(tf.random_normal([10], mean=0, stddev=0.01)) logits = tf.nn.relu(tf.matmul(relu_1, W2) + b2) return logits def loss_function(self): """ Calculates the model cross-entropy loss :return: the loss of the model as a tensor """ loss = tf.losses.softmax_cross_entropy(onehot_labels=self.label, logits=self.prediction) return loss def optimizer(self): """ Optimizes the model loss using an Adam Optimizer :return: the optimizer as a tensor """ learning_rate = 0.1 sgd = tf.train.GradientDescentOptimizer(learning_rate) train = sgd.minimize(self.loss) return train def accuracy_function(self): """ Calculates the model's prediction accuracy by comparing predictions to correct labels – no need to modify this :return: the accuracy of the model as a tensor """ correct_prediction = tf.equal(tf.argmax(self.prediction, 1), tf.argmax(self.label, 1)) return tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) def main(): t_start = time.time() mnist = input_data.read_data_sets("data/mnist/", one_hot=True) batch_sz = 50 batch = 2000 inputs = tf.placeholder(shape=[batch_sz, 28, 28, 1], dtype=tf.float32) labels = tf.placeholder(shape=[batch_sz, 10], dtype=tf.float32) model = Model(inputs, labels) session_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True)) sess = tf.Session(config=session_config) # sess = tf.Session() sess.run(tf.global_variables_initializer()) for i in range(batch): next_image, next_label = mnist.train.next_batch(batch_sz) next_image = next_image.reshape((batch_sz, 28, 28, 1)) sess.run(model.optimize, feed_dict={inputs: next_image, labels: next_label}) acc, test_images, test_labels = 0, mnist.test.images, mnist.test.labels test_batch = math.ceil(len(test_images) / batch_sz) for i in range(test_batch): batch_images = test_images[i * batch_sz: (i + 1) * batch_sz] batch_images = batch_images.reshape((batch_sz, 28, 28, 1)) batch_labes = test_labels[i * batch_sz: (i + 1) * batch_sz] acc += sess.run(model.accuracy, feed_dict={inputs: batch_images, labels: batch_labes}) acc /= test_batch print(acc) print(time.time() - t_start, 'seconds') return if __name__ == '__main__': main() ```

Installing the latest driver (445.87) for my RTX 2080 solved this issue for me.

Unfortunately the last driver for linux is not the version 445.87 and after installing the last driver available on my computer I could not see any change.

My minimal problem is below. Interestingly the problem is not conv2d. I can change the order of these three commands and it is always the third that one fails. Allowing growth by means of adding command line option -a makes the script finish without problem on our TF21. installation.

import sys
import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus and len(sys.argv)> 1 and sys.argv[1].startswith("-a"):
    print("allowing growth")
    growth = True
else:
    print("nogrowth")
    growth = False

try:
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, growth)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
    print(e)
    
tf.matmul(tf.zeros((2,2,2)), tf.zeros((2,2,2)))
tf.signal.stft(tf.zeros(3000, dtype=tf.float32), 512, 128)
tf.nn.conv2d(tf.zeros((2,20,20,20), dtype=tf.float32),
                                         filters=tf.zeros((2,2,20,20), dtype=tf.float32),
            strides=(1,1,1,1), padding="VALID")
print("done")

the last lines of the log is as follows

2020-03-06 17:06:48.920491: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-06 17:06:49.029343: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-06 17:06:49.473013: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-06 17:06:49.474368: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
nogrowth
1 Physical GPUs, 1 Logical GPUs
Traceback (most recent call last):
  File "./run_cuda_con2d_last.py", line 24, in <module>
    strides=(1,1,1,1), padding="VALID")
  File "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 1914, in conv2d_v2
    name=name)
  File "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/ops/nn_ops.py", line 2011, in conv2d
    name=name)
  File "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_nn_ops.py", line 937, in conv2d
    _ops.raise_from_not_ok_status(e, name)
  File "/data/anasynth/anaconda3/envs/tf2.1/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]

I dont see a means to provide full log files here. You can find them for various changes in the order of invocations of the different operations here

github.com/tensorflow/tensorflow

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

opened 06:29AM - 21 Dec 18 UTC

closed 07:05PM - 05 Aug 21 UTC

michaelmyc

stat:awaiting response type:bug stale comp:gpu TF 2.0

<em>Please make sure that this is a bug. As per our [GitHub Policy](https://gith…ub.com/tensorflow/tensorflow/blob/master/ISSUES.md), we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em> **System information** - Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes and No (described below) - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro - Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: - TensorFlow installed from (source or binary): tf-nightly-gpu (Dec 19, r1.13) - TensorFlow version (use command below): 1.13.0-dev20181219 - Python version: 3.7.1 - Bazel version (if compiling from source): - GCC/Compiler version (if compiling from source): - CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1 - GPU model and memory: RTX 2070 8GB **Describe the current behavior** I'm running the CNN model on MNIST. When I'm running with the GPU, I am encountering ```2018-12-20 20:09:13.644176: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR``` I did some digging and realized that it is a memory issue (which shouldn't be the case as I have 32GB of RAM and 64GB of swap. I ran htop when running the model and I have 20+GB free, which is more than enough to fit the 8GB vRAM mappings. Using the `gpu_options.allow_growth = True` gets the model to work properly, and setting `os.environ['CUDA_VISIBLE_DEVICES'] = '-1'` also works. This means that I AM facing a memory issue, but I don't see how. Also, using `gpu_options.allow_growth = True` does not fix the same issue when trying to run tensorflow/models/official/mnist/ model, which should have a similar behavior with my code. **Code to reproduce the issue** ``` import os import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data import math import time # Killing optional CPU driver warnings os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # os.environ['CUDA_VISIBLE_DEVICES'] = '-1' tf.logging.set_verbosity(tf.logging.ERROR) class Model: def __init__(self, image, label): """ A Model class contains a computational graph that classifies images to predictions. Each of its methods builds part of the graph on Model initialization. Do not modify the constructor, as doing so would break the autograder. You may, however, add class variables to use in your graph-building. e.g. learning rate, image: the input image to the computational graph as a tensor label: the correct label of an image as a tensor prediction: the output prediction of the computational graph, produced by self.forward_pass() optimize: the model's optimizing tensor produced by self.optimizer() loss: the model's loss produced by computing self.loss_function() accuracy: the model's prediction accuracy """ self.image = image self.label = label # TO-DO: Add any class variables you want to use. self.prediction = self.forward_pass() self.loss = self.loss_function() self.optimize = self.optimizer() self.accuracy = self.accuracy_function() def forward_pass(self): """ Predicts a label given an image using convolution layers :return: the prediction as a tensor """ filter_1 = tf.Variable(tf.truncated_normal([3, 3, 1, 8], stddev=0.1)) conv_1 = tf.nn.conv2d(self.image, filter_1, [1, 1, 1, 1], "SAME") reshaped = tf.reshape(conv_1, shape=[50, -1]) L1 = reshaped.shape[1].value L2 = 500 W1 = tf.Variable(tf.random_normal([L1, L2], mean=0, stddev=0.01)) b1 = tf.Variable(tf.random_normal([L2], mean=0, stddev=0.01)) relu_1 = tf.nn.relu(tf.matmul(reshaped, W1) + b1) W2 = tf.Variable(tf.random_normal([L2, 10], mean=0, stddev=0.01)) b2 = tf.Variable(tf.random_normal([10], mean=0, stddev=0.01)) logits = tf.nn.relu(tf.matmul(relu_1, W2) + b2) return logits def loss_function(self): """ Calculates the model cross-entropy loss :return: the loss of the model as a tensor """ loss = tf.losses.softmax_cross_entropy(onehot_labels=self.label, logits=self.prediction) return loss def optimizer(self): """ Optimizes the model loss using an Adam Optimizer :return: the optimizer as a tensor """ learning_rate = 0.1 sgd = tf.train.GradientDescentOptimizer(learning_rate) train = sgd.minimize(self.loss) return train def accuracy_function(self): """ Calculates the model's prediction accuracy by comparing predictions to correct labels – no need to modify this :return: the accuracy of the model as a tensor """ correct_prediction = tf.equal(tf.argmax(self.prediction, 1), tf.argmax(self.label, 1)) return tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) def main(): t_start = time.time() mnist = input_data.read_data_sets("data/mnist/", one_hot=True) batch_sz = 50 batch = 2000 inputs = tf.placeholder(shape=[batch_sz, 28, 28, 1], dtype=tf.float32) labels = tf.placeholder(shape=[batch_sz, 10], dtype=tf.float32) model = Model(inputs, labels) session_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True)) sess = tf.Session(config=session_config) # sess = tf.Session() sess.run(tf.global_variables_initializer()) for i in range(batch): next_image, next_label = mnist.train.next_batch(batch_sz) next_image = next_image.reshape((batch_sz, 28, 28, 1)) sess.run(model.optimize, feed_dict={inputs: next_image, labels: next_label}) acc, test_images, test_labels = 0, mnist.test.images, mnist.test.labels test_batch = math.ceil(len(test_images) / batch_sz) for i in range(test_batch): batch_images = test_images[i * batch_sz: (i + 1) * batch_sz] batch_images = batch_images.reshape((batch_sz, 28, 28, 1)) batch_labes = test_labels[i * batch_sz: (i + 1) * batch_sz] acc += sess.run(model.accuracy, feed_dict={inputs: batch_images, labels: batch_labes}) acc /= test_batch print(acc) print(time.time() - t_start, 'seconds') return if __name__ == '__main__': main() ```

Any help would be much appreciated
Axel

axel.roebel · May 18, 2020, 6:37pm

Just to avoid anybody believing that my installation is broken!

I just repeated the test with the official tensorflow/tensorflof:2.1.0-gpu-py3 docker.
It has exactly the same problem.

Thanks
Axel

Topic		Replies	Views
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	29	51603	October 12, 2021
cudnn lstm is broken above driver 431.60, 'Unexpected Event status: 1 cuda' cuDNN	14	8734	February 4, 2021
"Failed to get convolution algorithm" problem cuDNN	4	8491	September 7, 2019
Fail to initialize CUDNN when running tensorflow: CUDNN_STATUS_INTERNAL_ERROR Jetson AGX Xavier tensorflow , cudnn	7	2819	October 18, 2021
when using Tensorrt 6.0.1.5, Cudnn Error in initializeCommonContext: 4 TensorRT	7	4616	March 19, 2020
CuDNN error while fitting CNN cuDNN	2	3529	May 17, 2020
TensorRT 5 Bug？cuda/cudaConvolutionLayer.cpp (163) - Cudnn Error in execute: 3 TensorRT	3	2664	June 28, 2019
tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR cuDNN	4	7131	December 24, 2020
TensorFlow Issue - 'NonMaxSuppressionV3' in binary Jetson TX2	16	3150	October 18, 2021
Faster R-CNN: too many resources requested for launch Jetson TX2	27	7151	September 14, 2018

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Related topics