Hi, I’m running tensorflow 1.4.0 with python3.5 on TX2 but this seems unstable.
I run Python Script (TensorFlow tutorials), but in most cases (not every time) I meet following errors:
nvidia@tegra-ubuntu:~/classify_image$ python3 classify_image.py
2018-02-19 14:06:50.212300: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:856] ARM64 does not support NUMA - returning NUMA node zero
2018-02-19 14:06:50.212441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 5.68GiB
2018-02-19 14:06:50.212503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-19 14:06:52.777955: W tensorflow/core/framework/op_def_util.cc:334] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
2018-02-19 14:07:06.326244: E tensorflow/stream_executor/cuda/cuda_driver.cc:1080] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2018-02-19 14:07:06.326328: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x2f09f40: CUDA_ERROR_LAUNCH_FAILED
2018-02-19 14:07:06.326373: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x2f09f40: CUDA_ERROR_LAUNCH_FAILED
2018-02-19 14:07:06.326530: E tensorflow/stream_executor/cuda/cuda_dnn.cc:2279] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
status, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([1,80,73,73]) filter shape([3,3,80,192])
[[Node: conv_4/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv_3, conv_4/conv2d_params)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "classify_image.py", line 227, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "classify_image.py", line 193, in main
run_inference_on_image(image)
File "classify_image.py", line 157, in run_inference_on_image
{'DecodeJpeg/contents:0': image_data})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([1,80,73,73]) filter shape([3,3,80,192])
[[Node: conv_4/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv_3, conv_4/conv2d_params)]]
Caused by op 'conv_4/Conv2D', defined at:
File "classify_image.py", line 227, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "classify_image.py", line 193, in main
run_inference_on_image(image)
File "classify_image.py", line 144, in run_inference_on_image
create_graph()
File "classify_image.py", line 127, in create_graph
_ = tf.import_graph_def(graph_def, name='')
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 313, in import_graph_def
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): cuDNN launch failure : input shape([1,80,73,73]) filter shape([3,3,80,192])
[[Node: conv_4/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv_3, conv_4/conv2d_params)]]
2018-02-19 14:07:06.726066: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x2f09f40: CUDA_ERROR_LAUNCH_FAILED
System Information:
- Jetson TX2
- JetPack 3.1
- Python 3.5.2
- TensorFlow 1.4.0 https://github.com/lukejocz/tensorflow-1.4.0-cp35-cp35m-linux_aarch64
- Python script TensorFlow tutorials - "classify_image.py" https://www.tensorflow.org/tutorials/image_recognition https://github.com/tensorflow/models/tree/master/tutorials/image/imagenet/classify_image.py
Thanks.