Hello,
Last week I trained a simple Unet model and everything worked fine. Today the same script is giving me various cudNN errors.
Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
or
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
All cudNN files exist where they should be (according to the guide Installation Guide :: NVIDIA Deep Learning cuDNN Documentation ).
Here’s a complete traceback:
2020-03-02 11:57:31.599147: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2020-03-02 11:57:33.909352: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-03-02 11:57:33.913881: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-03-02 11:57:34.042894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 11:57:34.043058: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 11:57:34.043651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 11:57:34.822207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 11:57:34.822313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-03-02 11:57:34.822373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-03-02 11:57:34.823052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
2020-03-02 11:57:34.869434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 11:57:34.869560: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 11:57:34.869970: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 11:57:34.870456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 11:57:34.870568: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 11:57:34.870962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 11:57:34.871190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 11:57:34.871270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-03-02 11:57:34.871321: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-03-02 11:57:34.871786: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
Epoch 1/100
2020-03-02 11:57:36.180060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.607
pciBusID: 0000:41:00.0
2020-03-02 11:57:36.180186: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-03-02 11:57:36.180585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-03-02 11:57:36.180695: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-02 11:57:36.180773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-03-02 11:57:36.180823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-03-02 11:57:36.181239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 8784 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1)
2020-03-02 11:57:36.420303: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-03-02 11:57:36.929753: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-02 11:57:36.930244: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "E:/Explorium/python/unet_trainer.py", line 80, in <module>
results = model.fit_generator(train_generator, epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, validation_data=val_generator, validation_steps=VALIDATION_STEPS, callbacks=callbacks)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1297, in fit_generator
steps_name='steps_per_epoch')
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py", line 265, in model_iteration
batch_outs = batch_function(*batch_data)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 973, in train_on_batch
class_weight=class_weight, reset_metrics=reset_metrics)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 264, in train_on_batch
output_loss_metrics=model._output_loss_metrics)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 311, in train_on_batch
output_loss_metrics=output_loss_metrics))
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 252, in _process_single_batch
training=training))
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 127, in _model_loss
outs = model(inputs, **kwargs)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 708, in call
convert_kwargs_to_constants=base_layer_utils.call_context().saving)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 860, in _run_internal_graph
output_tensors = layer(computed_tensors, **kwargs)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\engine\base_layer.py", line 891, in __call__
outputs = self.call(cast_inputs, *args, **kwargs)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\keras\layers\convolutional.py", line 197, in call
outputs = self._convolution_op(inputs, self.kernel)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 1134, in __call__
return self.conv_op(inp, filter)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 639, in __call__
return self.call(inp, filter)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 238, in __call__
name=self.name)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\nn_ops.py", line 2010, in conv2d
name=name)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1031, in conv2d
data_format=data_format, dilations=dilations, name=name, ctx=_ctx)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\ops\gen_nn_ops.py", line 1130, in conv2d_eager_fallback
ctx=_ctx, name=name)
File "C:\Users\E-soft\Anaconda3\envs\Explorium\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [Op:Conv2D]
fatal : Memory allocation failure
Process finished with exit code 1
Here’s my modeL:
import numpy as np
import os
import cv2
import random
from tensorflow.python.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Input, BatchNormalization, Activation, Dropout
from tensorflow.python.keras.layers.convolutional import Conv2D, Conv2DTranspose
from tensorflow.python.keras.layers.pooling import MaxPooling2D
from tensorflow.python.keras.layers.merge import concatenate
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
def data_gen(templates_folder, masks_folder, image_width, batch_size):
counter = 0
images_list = os.listdir(templates_folder)
random.shuffle(images_list)
while True:
templates_pack = np.zeros((batch_size, image_width, image_width, 3)).astype('float')
masks_pack = np.zeros((batch_size, image_width, image_width, 1)).astype('float')
for i in range(counter, counter + batch_size):
template = cv2.imread(templates_folder + '/' + images_list[i]) / 255.
templates_pack[i - counter] = template
mask = cv2.imread(masks_folder + '/' + images_list[i], cv2.IMREAD_GRAYSCALE) / 255.
mask = mask.reshape(image_width, image_width, 1) # Add extra dimension for parity with template size [1536 * 1536 * 3]
masks_pack[i - counter] = mask
counter += batch_size
if counter + batch_size >= len(images_list):
counter = 0
random.shuffle(images_list)
yield templates_pack, masks_pack
def get_unet(input_image, n_filters, kernel_size, dropout=0.5):
conv_1 = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), data_format="channels_last", activation='relu', kernel_initializer="he_normal", padding="same")(input_image)
conv_1 = BatchNormalization()(conv_1)
conv_2 = Conv2D(filters=n_filters, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_1)
conv_2 = BatchNormalization()(conv_2)
pool_1 = MaxPooling2D(pool_size=(2, 2))(conv_2)
pool_1 = Dropout(dropout * 0.5)(pool_1)
conv_3 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_1)
conv_3 = BatchNormalization()(conv_3)
conv_4 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_3)
conv_4 = BatchNormalization()(conv_4)
pool_2 = MaxPooling2D(pool_size=(2, 2))(conv_4)
pool_2 = Dropout(dropout)(pool_2)
conv_5 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_2)
conv_5 = BatchNormalization()(conv_5)
conv_6 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_5)
conv_6 = BatchNormalization()(conv_6)
pool_3 = MaxPooling2D(pool_size=(2, 2))(conv_6)
pool_3 = Dropout(dropout)(pool_3)
conv_7 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_3)
conv_7 = BatchNormalization()(conv_7)
conv_8 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_7)
conv_8 = BatchNormalization()(conv_8)
pool_4 = MaxPooling2D(pool_size=(2, 2))(conv_8)
pool_4 = Dropout(dropout)(pool_4)
conv_9 = Conv2D(filters=n_filters * 16, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(pool_4)
conv_9 = BatchNormalization()(conv_9)
conv_10 = Conv2D(filters=n_filters * 16, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_9)
conv_10 = BatchNormalization()(conv_10)
upconv_1 = Conv2DTranspose(n_filters * 8, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_10)
concat_1 = concatenate([upconv_1, conv_8])
concat_1 = Dropout(dropout)(concat_1)
conv_11 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_1)
conv_11 = BatchNormalization()(conv_11)
conv_12 = Conv2D(filters=n_filters * 8, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_11)
conv_12 = BatchNormalization()(conv_12)
upconv_2 = Conv2DTranspose(n_filters * 4, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_12)
concat_2 = concatenate([upconv_2, conv_6])
concat_2 = Dropout(dropout)(concat_2)
conv_13 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_2)
conv_13 = BatchNormalization()(conv_13)
conv_14 = Conv2D(filters=n_filters * 4, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_13)
conv_14 = BatchNormalization()(conv_14)
upconv_3 = Conv2DTranspose(n_filters * 2, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_14)
concat_3 = concatenate([upconv_3, conv_4])
concat_3 = Dropout(dropout)(concat_3)
conv_15 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_3)
conv_15 = BatchNormalization()(conv_15)
conv_16 = Conv2D(filters=n_filters * 2, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_15)
conv_16 = BatchNormalization()(conv_16)
upconv_4 = Conv2DTranspose(n_filters * 1, (kernel_size, kernel_size), strides=(2, 2), padding='same')(conv_16)
concat_4 = concatenate([upconv_4, conv_2])
concat_4 = Dropout(dropout)(concat_4)
conv_17 = Conv2D(filters=n_filters * 1, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(concat_4)
conv_17 = BatchNormalization()(conv_17)
conv_18 = Conv2D(filters=n_filters * 1, kernel_size=(kernel_size, kernel_size), activation='relu', kernel_initializer="he_normal", padding="same")(conv_17)
conv_18 = BatchNormalization()(conv_18)
conv_19 = Conv2D(1, (1, 1), activation='sigmoid')(conv_18)
model = Model(inputs=input_image, outputs=conv_19)
return model
callbacks = [EarlyStopping(patience=10, verbose=1),
ReduceLROnPlateau(factor=0.1, patience=3, min_lr=0.00001, verbose=1),
ModelCheckpoint("model-prototype.h5", verbose=1, save_best_only=True, save_weights_only=True)
]
train_templates_path = "E:/train/templates"
train_masks_path = "E:/train/masks"
valid_templates_path = "E:/valid/templates"
valid_masks_path = "E:/valid/masks"
TRAIN_SET_SIZE = len(os.listdir(train_templates_path))
VALID_SET_SIZE = len(os.listdir(valid_templates_path))
BATCH_SIZE = 1
EPOCHS = 100
STEPS_PER_EPOCH = TRAIN_SET_SIZE / BATCH_SIZE
VALIDATION_STEPS = VALID_SET_SIZE / BATCH_SIZE
IMAGE_WIDTH = 1536
train_generator = data_gen(train_templates_path, train_masks_path, IMAGE_WIDTH, batch_size = BATCH_SIZE)
val_generator = data_gen(valid_templates_path, valid_masks_path, IMAGE_WIDTH, batch_size = BATCH_SIZE)
input_image = Input((IMAGE_WIDTH, IMAGE_WIDTH, 3), name='img')
model = get_unet(input_image, n_filters=16, kernel_size = 3, dropout=0.05)
model.compile(optimizer=Adam(lr=0.001), loss="binary_crossentropy", metrics=["accuracy"])
results = model.fit_generator(train_generator, epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH, validation_data=val_generator, validation_steps=VALIDATION_STEPS, callbacks=callbacks)