tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

13280879766 · March 3, 2020, 5:54am

2020-03-03 13:46:22.931387: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.931705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
2020-03-03 13:46:22.931735: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-03 13:46:22.931745: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-03 13:46:22.931753: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-03 13:46:22.931762: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-03 13:46:22.931770: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-03 13:46:22.931777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-03 13:46:22.931786: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-03 13:46:22.931846: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.932151: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.932406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-03-03 13:46:22.932425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-03 13:46:22.932431: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-03-03 13:46:22.932435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-03-03 13:46:22.932514: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.932810: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.933071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1519 MB memory) → physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
2020-03-03 13:46:22.938490: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.938811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
2020-03-03 13:46:22.938846: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-03 13:46:22.938858: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-03 13:46:22.938868: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-03 13:46:22.938877: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-03 13:46:22.938887: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-03 13:46:22.938895: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-03 13:46:22.938905: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-03 13:46:22.938990: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.939289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.939542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-03-03 13:46:22.939965: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.940235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.176GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
2020-03-03 13:46:22.940255: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-03-03 13:46:22.940266: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-03 13:46:22.940275: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-03-03 13:46:22.940284: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-03-03 13:46:22.940292: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-03-03 13:46:22.940301: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-03-03 13:46:22.940309: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-03 13:46:22.940356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.940649: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.940922: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-03-03 13:46:22.940941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-03 13:46:22.940947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-03-03 13:46:22.940951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-03-03 13:46:22.941059: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.941425: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-03 13:46:22.941691: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1519 MB memory) → physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)
Found 2000 images belonging to 2 classes.
Found 1000 images belonging to 2 classes.
Epoch 1/30
2020-03-03 13:46:44.569343: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-03-03 13:46:44.744792: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-03-03 13:46:45.456272: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-03 13:46:45.460627: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-03 13:46:45.460696: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node vgg16/block1_conv1/convolution}}]]

SunilJB · March 3, 2020, 6:31am

Hi,

This could be due to OOM. Could you try to reduce the TF GPU memory fraction: config.gpu_options.per_process_gpu_memory_fraction.

If issue persist, could please try fresh cuDNN installation.

Thanks

shitalthakkar.ec · March 9, 2020, 9:33am

WARNING:tensorflow:From /opt/digits/digits/tools/tensorflow/tf_data.py:472: string_input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(string_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py:278: input_producer (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensor_slices(input_tensor).shuffle(tf.shape(input_tensor, out_type=tf.int64)[0]).repeat(num_epochs). If shuffle=False, omit the .shuffle(...).
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py:190: limit_epochs (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensors(tensor).repeat(num_epochs).
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.from_tensors(tensor).repeat(num_epochs).
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py:113: count_up_to (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Dataset.range instead.
Instructions for updating:
Prefer Dataset.range instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py:2132: count_up_to (from tensorflow.python.ops.state_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Prefer Dataset.range instead.
Instructions for updating:
Prefer Dataset.range instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py:199: init (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the tf.data module.
Instructions for updating:
To construct input pipelines, use the tf.data module.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py:199: add_queue_runner (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the tf.data module.
Instructions for updating:
To construct input pipelines, use the tf.data module.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py:202: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.cast instead.
WARNING:tensorflow:From /opt/digits/digits/tools/tensorflow/tf_data.py:547: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It’s easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means tf.py_functions can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It’s easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means tf.py_functions can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
WARNING:tensorflow:From /opt/digits/digits/tools/tensorflow/tf_data.py:410: batch (from tensorflow.python.training.input) is deprecated and will be removed in a future version.
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.batch(batch_size) (or padded_batch(...) if dynamic_pad=True).
Instructions for updating:
Queue-based input pipelines have been replaced by tf.data. Use tf.data.Dataset.batch(batch_size) (or padded_batch(...) if dynamic_pad=True).
2020-03-09 09:25:27.604742: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2599990000 Hz
2020-03-09 09:25:27.605793: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x608a6f0 executing computations on platform Host. Devices:
2020-03-09 09:25:27.605824: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): ,
2020-03-09 09:25:27.681267: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-03-09 09:25:27.681599: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x60c1090 executing computations on platform CUDA. Devices:
2020-03-09 09:25:27.681614: I tensorflow/compiler/xla/service/service.cc:168] StreamExecutor device (0): GeForce GTX 1660 Ti, Compute Capability 7.5
2020-03-09 09:25:27.681700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
totalMemory: 5.80GiB freeMemory: 5.44GiB
2020-03-09 09:25:27.681711: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-09 09:25:28.406091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-09 09:25:28.406117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-09 09:25:28.406123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-09 09:25:28.406214: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 5183 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py:1627: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
2020-03-09 09:25:28.781846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-09 09:25:28.781869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-09 09:25:28.781888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-09 09:25:28.781892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-09 09:25:28.781944: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 5183 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-03-09 09:25:28.828769: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-03-09 09:25:28.828791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-03-09 09:25:28.828797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2020-03-09 09:25:28.828802: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2020-03-09 09:25:28.828846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5183 MB memory) → physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
WARNING:tensorflow:From /opt/digits/digits/tools/tensorflow/model.py:212: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the tf.data module.
Instructions for updating:
To construct input pipelines, use the tf.data module.
2020-03-09 09:25:29.223468: I tensorflow/stream_executor/dso_loader.cc:153] successfully opened CUDA library libcublas.so.10 locally
2020-03-09 09:25:30.771916: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-03-09 09:25:30.779090: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File “/opt/digits/digits/tools/tensorflow/main.py”, line 745, in
tf.app.run()
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “/opt/digits/digits/tools/tensorflow/main.py”, line 568, in main
Validation(sess, val_model, 0)
File “/opt/digits/digits/tools/tensorflow/main.py”, line 378, in Validation
summary_str = sess.run(model.summary)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 929, in run
run_metadata_ptr)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1152, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1328, in _do_run
run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node val/model/conv1/Conv2D (defined at :20) ]]
Caused by op u’val/model/conv1/Conv2D’, defined at:
File “/opt/digits/digits/tools/tensorflow/main.py”, line 745, in
tf.app.run()
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py”, line 125, in run
_sys.exit(main(argv))
File “/opt/digits/digits/tools/tensorflow/main.py”, line 507, in main
val_model.create_model(UserModel, stage_scope) # noqa
File “/opt/digits/digits/tools/tensorflow/model.py”, line 157, in create_model
tower_model.inference # touch to initialize
File “/opt/digits/digits/tools/tensorflow/utils.py”, line 37, in decorator
setattr(self, attribute, function(self))
File “”, line 20, in inference
File “/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py”, line 182, in func_with_args
return func(*args, **current_args)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py”, line 1158, in convolution2d
conv_dims=2)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py”, line 182, in func_with_args
return func(*args, **current_args)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/layers/python/layers/layers.py”, line 1061, in convolution
outputs = layer.apply(inputs)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 1227, in apply
return self.call(inputs, *args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py”, line 530, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 554, in call
outputs = self.call(inputs, *args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/layers/convolutional.py”, line 194, in call
outputs = self._convolution_op(inputs, self.kernel)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py”, line 966, in call
return self.conv_op(inp, filter)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py”, line 591, in call
return self.call(inp, filter)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_ops.py”, line 208, in call
name=self.name)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py”, line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py”, line 788, in _apply_op_helper
op_def=op_def)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py”, line 3300, in create_op
op_def=op_def)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py”, line 1801, in init
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node val/model/conv1/Conv2D (defined at :20) ]]

I have received same error while running digits for my dataset. Could you plz help. I already installed cuDNN

SunilJB · March 12, 2020, 4:46am

Hi,

Can you share the script and model file to reproduce the issue so we can help better?

Thanks

rehanfick · December 24, 2020, 11:23pm

gpus = tf.config.experimental.list_physical_devices(‘GPU’)
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices(‘GPU’)
print(len(gpus), “Physical GPUs,”, len(logical_gpus), “Logical GPUs”)
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)

I think this piece of code from here (Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED - #2 by NVES_R) might help you as it did in my case.

Topic		Replies	Views
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Frameworks tensorflow	1	1386	May 18, 2020
Fail to initialize CUDNN when running tensorflow: CUDNN_STATUS_INTERNAL_ERROR Jetson AGX Xavier tensorflow , cudnn	7	2844	October 18, 2021
"Failed to get convolution algorithm" problem cuDNN	4	8503	September 7, 2019
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR cuDNN	3	8125	November 7, 2019
CuDNN error while fitting CNN cuDNN	2	3534	May 17, 2020
Failed to get convolution algorithm. This is probably because cuDNN failed to initialize cuDNN	1	788	June 23, 2020
Windows 10: R-Studio+R 3.5.1+Tensorflow+Python 3.6- Convolution Neural Network Error while fitting the model CUDA Setup and Installation	2	737	August 12, 2018
cuDNN Error cuDNN	1	945	April 25, 2019
"Could not create cudnn handle" or "cudnn64_7.dll not found" errors cuDNN	2	5426	September 20, 2020
Installing and Running Jetpack 3.2 Caffe problem GPU-Accelerated Libraries	3	550	April 16, 2019

tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

Related topics