cuDNN failed to initialize

cuDNN failed to initialize while training using my RTX2080

So after some more experimentation. a reboot and the following sequence made the 1D convolution work, when executed right after the reboot.

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

The thing to highlight is that this required a full reboot, and was the first sequence executed.

This did not work previously when I tried without a reboot. Even shutting down and restarting jupyter notebook did not help.

Here’s what I have installed for reference, with a GTX 1660 Ti on an ASUS ROG Strix laptop under Ubuntu 18.04.

$ sudo dpkg -i libcudnn7_7.4.1.5-1+cuda10.0_amd64.deb libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64.deb libcudnn7-doc_7.4.1.5-1+cuda10.0_amd64.deb
$ pip3 install --upgrade tensorflow-gpu==1.13.1
$ nvidia-smi
Sat Sep 7 12:02:49 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166… Off | 00000000:01:00.0 On | N/A |
| N/A 52C P0 33W / N/A | 5011MiB / 5944MiB | 17% Default |
±------------------------------±---------------------±---------------------+

==============================================================================

[1]
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
[…]
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop

model = Sequential()
model.add(layers.Conv1D(32,5,activation=‘relu’,
input_shape=(None, float_data.shape[-1])))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32,5,activation=‘relu’))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32,5,activation=‘relu’))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))

model.compile(optimizer=RMSprop(), loss=‘mae’)

history = model.fit_generator(train_gen,
steps_per_epoch=500,
epochs = 20,
validation_data=val_gen,
validation_steps = val_steps)

2019-09-07 12:01:14.981980: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-07 12:01:14.982624: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2fc87c0 executing computations on platform CUDA. Devices:
2019-09-07 12:01:14.982643: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5
2019-09-07 12:01:15.012160: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2019-09-07 12:01:15.013817: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x2fef210 executing computations on platform Host. Devices:
2019-09-07 12:01:15.013898: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-09-07 12:01:15.014242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
totalMemory: 5.80GiB freeMemory: 5.35GiB

[1]
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))

from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
[…]
model = Sequential()
model.add(layers.Embedding(max_features, 128, input_length=max_len))
model.add(layers.Conv1D(32,7,activation=‘relu’))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32,7,activation=‘relu’))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))

model.summary()

model.compile(optimizer=RMSprop(lr=1e-4),
loss=‘binary_crossentropy’,
metrics=[‘acc’])
history = model.fit(x_train, y_train,
epochs = 10,
batch_size=128,
validation_split=0.2)

2019-09-07 12:13:01.916734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
totalMemory: 5.80GiB freeMemory: 838.06MiB
2019-09-07 12:13:01.916783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-09-07 12:13:01.917183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-07 12:13:01.917192: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-09-07 12:13:01.917199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-09-07 12:13:01.917248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 613 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-09-07 12:13:27.897105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-09-07 12:13:27.897140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-09-07 12:13:27.897146: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-09-07 12:13:27.897150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-09-07 12:13:27.897199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 613 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-09-07 12:13:28.416804: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-09-07 12:13:29.495347: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 650.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-07 12:13:29.495943: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 650.38MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-09-07 12:13:29.503194: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.02GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

Thanks for your information. But I am using NVIDIA RTX 2080 GPU. If you know the installation procedure, please do help. Thanks in advance