trouble with Tensorflow and TX2.

edge0222 · February 23, 2018, 1:14am

Hi, I’m running tensorflow 1.4.0 with python3.5 on TX2 but this seems unstable.
I run Python Script (TensorFlow tutorials), but in most cases (not every time) I meet following errors:

nvidia@tegra-ubuntu:~/classify_image$ python3 classify_image.py

2018-02-19 14:06:50.212300: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:856] ARM64 does not support NUMA - returning NUMA node zero
2018-02-19 14:06:50.212441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 5.68GiB
2018-02-19 14:06:50.212503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-19 14:06:52.777955: W tensorflow/core/framework/op_def_util.cc:334] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
2018-02-19 14:07:06.326244: E tensorflow/stream_executor/cuda/cuda_driver.cc:1080] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2018-02-19 14:07:06.326328: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x2f09f40: CUDA_ERROR_LAUNCH_FAILED
2018-02-19 14:07:06.326373: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x2f09f40: CUDA_ERROR_LAUNCH_FAILED
2018-02-19 14:07:06.326530: E tensorflow/stream_executor/cuda/cuda_dnn.cc:2279] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([1,80,73,73]) filter shape([3,3,80,192])
         [[Node: conv_4/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv_3, conv_4/conv2d_params)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "classify_image.py", line 227, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "classify_image.py", line 193, in main
    run_inference_on_image(image)
  File "classify_image.py", line 157, in run_inference_on_image
    {'DecodeJpeg/contents:0': image_data})
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape([1,80,73,73]) filter shape([3,3,80,192])
         [[Node: conv_4/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv_3, conv_4/conv2d_params)]]

Caused by op 'conv_4/Conv2D', defined at:
  File "classify_image.py", line 227, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "classify_image.py", line 193, in main
    run_inference_on_image(image)
  File "classify_image.py", line 144, in run_inference_on_image
    create_graph()
  File "classify_image.py", line 127, in create_graph
    _ = tf.import_graph_def(graph_def, name='')
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 313, in import_graph_def
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InternalError (see above for traceback): cuDNN launch failure : input shape([1,80,73,73]) filter shape([3,3,80,192])
         [[Node: conv_4/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv_3, conv_4/conv2d_params)]]

2018-02-19 14:07:06.726066: E tensorflow/stream_executor/event.cc:33] error destroying CUDA event in context 0x2f09f40: CUDA_ERROR_LAUNCH_FAILED

System Information:

Jetson TX2
JetPack 3.1
Python 3.5.2
TensorFlow 1.4.0 https://github.com/lukejocz/tensorflow-1.4.0-cp35-cp35m-linux_aarch64
Python script TensorFlow tutorials - "classify_image.py" https://www.tensorflow.org/tutorials/image_recognition https://github.com/tensorflow/models/tree/master/tutorials/image/imagenet/classify_image.py

Thanks.

AastaLLL · March 1, 2018, 9:30am

Hi,

Could you limit the amount of GPU memory allocation and give it a try?

config = tf.ConfigProto()
config.gpu_options.allow_growth = True

session = tf.Session(config=config, ...)

Thanks.

Topic		Replies	Views
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed Jetson TX2	8	6278	October 18, 2021
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5573	October 18, 2021
CUDA Fail when running Tensorflow inference Jetson TX2	10	3323	February 2, 2018
Tensorflow on TX2 GPU sync error Jetson TX2	6	4535	October 18, 2021
Trying to execute tensorflow with GPU support on my Jetson TX2, but having error. Jetson TX2	2	1080	October 18, 2021
GPU Sync failed in TX2 when running Tensorflow Jetson TX2	7	5275	October 18, 2021
Fail to initialize CUDNN when running tensorflow: CUDNN_STATUS_INTERNAL_ERROR Jetson AGX Xavier tensorflow , cudnn	7	2804	October 18, 2021
Unknown: Failed to get convolution algorithm on Tx1 Jetson TX1	3	680	October 18, 2021
Odd behavior with Jetpack 3.2 and tensorflow Jetson TX2	4	1031	October 18, 2021
Cudnn PoolForward launch failed CUDA NVCC Compiler cuda , tensorflow , ubuntu , python	0	989	June 1, 2022

trouble with Tensorflow and TX2.

Related topics