tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

when I run darkflow, there is a error in tensorflow.

Loading from .pb and .meta
GPU mode with 0.6 usage
2018-08-20 12:37:16.229377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2018-08-20 12:37:16.229550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.04GiB
2018-08-20 12:37:16.229647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-08-20 12:37:20.054755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-20 12:37:20.054849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2018-08-20 12:37:20.054877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2018-08-20 12:37:20.055108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4711 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-08-20 12:37:21.416670: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***

Traceback (most recent call last):
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1322, in _do_call
return fn(*args)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/flow”, line 6, in
cliHandler(sys.argv)
File “/usr/local/lib/python3.5/dist-packages/darkflow/cli.py”, line 26, in cliHandler
tfnet = TFNet(FLAGS)
File “/usr/local/lib/python3.5/dist-packages/darkflow/net/build.py”, line 54, in init
self.build_from_pb()
File “/usr/local/lib/python3.5/dist-packages/darkflow/net/build.py”, line 98, in build_from_pb
self.setup_meta_ops()
File “/usr/local/lib/python3.5/dist-packages/darkflow/net/build.py”, line 146, in setup_meta_ops
self.sess.run(tf.global_variables_initializer())
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 900, in run
run_metadata_ptr)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1135, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1316, in _do_run
run_metadata)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

I use jetpack 3.2, and the tensorflow is 1.9

Hi,

Here are two suggestions for you:

1.
Please make sure the package you installed is built on JetPack3.2.
It’s required to have a package built with same JetPack version to allow CUDA driver run correctly.

2.
Could you try to decrease the batch size?

Thanks.

Thanks.

I will have a try.

From my experience - with setting

config.gpu_options.allow_growth = True

this error almost disappears.

Hi,

We have released an official TensorFlow package for TX2.
Please give it a try:
https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/

Thanks.

This thread is a bit out of date, but I am running into a similar issue. I am developing a standalone application on the Jetson TX2 development platform with Jetpack 3.3 / Ubuntu 16.04. I wanted to see if I could build a version of Tensorflow from source, but I have been unable to do so after several tries. I installed TensorFlow 1.9 for Python 2.7 (I need Python 2.7 for compatibility reasons) via the recommendation from the forum admin:
https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/

When I try to train a model, I see the following error:

[train]: 2019-06-20 14:57:04.286252: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***

I followed the recommendation given here:

https://devtalk.nvidia.com/default/topic/1031225/jetson-tx2/tensorflow-on-tx2-gpu-sync-error/

namely, putting these lines in the code:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True

session = tf.Session(config=config, ...)

I am able to train a model, but it appears inference happens too slowly for my application’s requirements. I am noticing significant latency between when data are sampled, ingested and when results from the models are displayed in my application. Have others noticed issues with the official TensorFlow build for TX2 for 1.9.0 or any other versions?

Thank you.

It is a bit hard to judge without knowing how your model looks like, what is the data size etc. So - some general advices:

  1. Check the performance profile - by default TX2 runs in a power-saving mode
  2. Check that model is optimised for inference
  3. Check that model is "warmed up" (first inferences after loading a model are usually much slower than later ones due to TF/CUDA/... way of work)
  4. Consider batch/streaming modes
  5. Check that your code does not intensively write to a flash disk (include logs)

The last one - on my models and data TensorRT was 2x-3x faster than TensorFlow. But, unfortunately, conversion process was nowhere close to “automatic” and required a decent amount of try-and-error.