tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

m15072762954 · August 20, 2018, 12:55pm

when I run darkflow, there is a error in tensorflow.

Loading from .pb and .meta
GPU mode with 0.6 usage
2018-08-20 12:37:16.229377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:864] ARM64 does not support NUMA - returning NUMA node zero
2018-08-20 12:37:16.229550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.04GiB
2018-08-20 12:37:16.229647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-08-20 12:37:20.054755: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-08-20 12:37:20.054849: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2018-08-20 12:37:20.054877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2018-08-20 12:37:20.055108: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4711 MB memory) → physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-08-20 12:37:21.416670: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***[u][u][/u][/u]
stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***

Traceback (most recent call last):
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1322, in _do_call
return fn(*args)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/usr/local/bin/flow”, line 6, in
cliHandler(sys.argv)
File “/usr/local/lib/python3.5/dist-packages/darkflow/cli.py”, line 26, in cliHandler
tfnet = TFNet(FLAGS)
File “/usr/local/lib/python3.5/dist-packages/darkflow/net/build.py”, line 54, in init
self.build_from_pb()
File “/usr/local/lib/python3.5/dist-packages/darkflow/net/build.py”, line 98, in build_from_pb
self.setup_meta_ops()
File “/usr/local/lib/python3.5/dist-packages/darkflow/net/build.py”, line 146, in setup_meta_ops
self.sess.run(tf.global_variables_initializer())
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 900, in run
run_metadata_ptr)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1135, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1316, in _do_run
run_metadata)
File “/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py”, line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

m15072762954 · August 20, 2018, 12:56pm

I use jetpack 3.2, and the tensorflow is 1.9

AastaLLL · August 22, 2018, 3:20am

Hi,

Here are two suggestions for you:

1.
Please make sure the package you installed is built on JetPack3.2.
It’s required to have a package built with same JetPack version to allow CUDA driver run correctly.

2.
Could you try to decrease the batch size?

Thanks.

m15072762954 · August 27, 2018, 4:45am

Thanks.

I will have a try.

dmitry.khalatov · August 29, 2018, 2:56pm

From my experience - with setting

config.gpu_options.allow_growth = True

this error almost disappears.

AastaLLL · August 30, 2018, 2:12am

Hi,

We have released an official TensorFlow package for TX2.
Please give it a try:
[url]https://devtalk.nvidia.com/default/topic/1038957/jetson-tx2/tensorflow-for-jetson-tx2-/[/url]

Thanks.

steven.kaneti · June 20, 2019, 11:17pm

This thread is a bit out of date, but I am running into a similar issue. I am developing a standalone application on the Jetson TX2 development platform with Jetpack 3.3 / Ubuntu 16.04. I wanted to see if I could build a version of Tensorflow from source, but I have been unable to do so after several tries. I installed TensorFlow 1.9 for Python 2.7 (I need Python 2.7 for compatibility reasons) via the recommendation from the forum admin:

When I try to train a model, I see the following error:

[train]: 2019-06-20 14:57:04.286252: E tensorflow/stream_executor/cuda/cuda_driver.cc:1108] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
stream_executor::cuda::CUDADriver::SynchronizeContext(stream_executor::cuda::CudaContext*)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***

I followed the recommendation given here:

https://devtalk.nvidia.com/default/topic/1031225/jetson-tx2/tensorflow-on-tx2-gpu-sync-error/

namely, putting these lines in the code:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True

session = tf.Session(config=config, ...)

I am able to train a model, but it appears inference happens too slowly for my application’s requirements. I am noticing significant latency between when data are sampled, ingested and when results from the models are displayed in my application. Have others noticed issues with the official TensorFlow build for TX2 for 1.9.0 or any other versions?

Thank you.

dmitry.khalatov · June 21, 2019, 2:18pm

It is a bit hard to judge without knowing how your model looks like, what is the data size etc. So - some general advices:

Check the performance profile - by default TX2 runs in a power-saving mode
Check that model is optimised for inference
Check that model is "warmed up" (first inferences after loading a model are usually much slower than later ones due to TF/CUDA/... way of work)
Consider batch/streaming modes
Check that your code does not intensively write to a flash disk (include logs)

The last one - on my models and data TensorRT was 2x-3x faster than TensorFlow. But, unfortunately, conversion process was nowhere close to “automatic” and required a decent amount of try-and-error.

Topic		Replies	Views
Tensorflow on TX2 GPU sync error Jetson TX2	6	4535	October 18, 2021
GPU Sync failed in TX2 when running Tensorflow Jetson TX2	7	5275	October 18, 2021
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5573	October 18, 2021
trouble with Tensorflow and TX2. Jetson TX2	1	1906	March 1, 2018
tensorflow.python.framework.errors_impl.InternalError: Failed to create session. Jetson TX2	4	3151	November 16, 2018
CUDA Fail when running Tensorflow inference Jetson TX2	10	3323	February 2, 2018
Trying to execute tensorflow with GPU support on my Jetson TX2, but having error. Jetson TX2	2	1080	October 18, 2021
Python code using tensorflow and cuda on Jetson TX2 is getting killed (logs below) Jetson TX2	8	1088	October 18, 2021
Tensorflow Memory Error Jetson TX2	25	15287	October 18, 2021
GPU support for tflite Jetson Nano cuda , tensorflow	8	5209	October 18, 2021

tensorflow.python.framework.errors_impl.InternalError: GPU sync failed

Related topics