$ nvidia-smi
Thu Sep 5 11:20:56 2019
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.40 Driver Version: 430.40 CUDA Version: 10.1 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 166… Off | 00000000:01:00.0 On | N/A |
| N/A 42C P8 5W / N/A | 4375MiB / 5944MiB | 1% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1992 G /usr/lib/xorg/Xorg 219MiB |
| 0 2219 G /usr/bin/gnome-shell 96MiB |
| 0 6431 G …quest-channel-token=8160790192539462645 46MiB |
| 0 6877 C /usr/bin/python3 4009MiB |
±----------------------------------------------------------------------------+
When running a shis has been a Conv1D issue for a while.
I could get one of the examples from Francois Chollet’s book (Listing 6.46) to work after rebooting my system. Then, voila, the next example fails (Listing 6.46).
This is with a GeForce GTX 1660 card in a laptop running ubuntu 18.04, cuDNN 10.0, Python 3.6 and tensorflow-gpu
===================================== stacked 1D Conv network =============================
from keras.models import Sequential
from keras import layers
from keras.optimizers import RMSprop
model = Sequential()
model.add(layers.Conv1D(32,5,activation=‘relu’,
input_shape=(None, float_data.shape[-1])))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32,5,activation=‘relu’))
model.add(layers.MaxPooling1D(3))
model.add(layers.Conv1D(32,5,activation=‘relu’))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))
model.compile(optimizer=RMSprop(), loss=‘mae’)
history = model.fit_generator(train_gen,
steps_per_epoch=500,
epochs = 20,
validation_data=val_gen,
validation_steps = val_steps)
~/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py in call(self, *args, **kwargs)
1456 ret = tf_session.TF_SessionRunCallable(self._session._session,
1457 self._handle, args,
→ 1458 run_metadata_ptr)
1459 if run_metadata:
1460 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv1d_1/convolution}}]]
[[loss/mul/_71]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv1d_1/convolution}}]]
0 successful operations.
0 derived errors ignored.
==========================================================================================
2019-09-05 12:02:43.301213: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-09-05 12:02:43.305059: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-09-05 12:02:43.409874: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-09-05 12:02:43.410234: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b0150 executing computations on platform CUDA. Devices:
2019-09-05 12:02:43.410263: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5
2019-09-05 12:02:43.430879: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400000000 Hz
2019-09-05 12:02:43.431295: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4160850 executing computations on platform Host. Devices:
2019-09-05 12:02:43.431312: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
…
2019-09-05 12:02:43.439496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5185 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-09-05 12:02:44.021009: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-09-05 12:02:44.223999: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-09-05 12:02:44.606974: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-05 12:02:44.614763: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR