I have installed tensorflow on my TX2 dev board using the instructions from:
For the most part everything works, I even installed Keras on top of it which makes development much faster and easier. However, I’ve started getting a very strange error sometimes when executing my neural networks.
Tensorflow is able to sense the GPU, but during the step of ‘Creating TensorFlow device’ it fails with the error:
GPU sync failed.
Has anyone else come across this error before? I’ve tried reinstalling tensorflow a few times but the error persists.
A common cause is the incompatible CUDA libraries/driver version.
Please remember that the wheel file shared here requires JetPack 3.2(CUDA 9) environment.
If you already in JetPack3.2 environment, could you share detail error log with us?
I do have JetPack 3.2, but I had the developer preview before upgrading to the production release. So I’m going to try and uninstall JetPack from my host machine and reinstall everything fresh as opposed to upgrading. Below is my error log:
/home/nvidia/.local/lib/python2.7/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from
np.floating is deprecated. In future, it will be treated as
np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
2018-03-20 12:21:16.070911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-03-20 12:21:16.071098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
totalMemory: 7.67GiB freeMemory: 5.60GiB
2018-03-20 12:21:16.071150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Adding visible gpu device 0
2018-03-20 12:21:17.330181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:987] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5115 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-03-20 12:21:18.116623: E tensorflow/stream_executor/cuda/cuda_driver.cc:1110] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN :: *** Begin stack trace ***
*** End stack trace ***
Traceback (most recent call last):
File “Python2FastBoatFinder.py”, line 18, in
model = load_model(’…/…/models/Gen5_Ship_Classifier.h5’)
File “/home/nvidia/.local/lib/python2.7/site-packages/keras/models.py”, line 246, in load_model
File “/home/nvidia/.local/lib/python2.7/site-packages/keras/engine/topology.py”, line 3382, in load_weights_from_hdf5_group
File “/home/nvidia/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py”, line 2373, in batch_set_value
File “/home/nvidia/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py”, line 192, in get_session
[tf.is_variable_initialized(v) for v in candidate_vars])
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 895, in run
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1128, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1344, in _do_run
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
I have an update.
I reinstalled Jetpack 3.2 on my host and re-flashed my TX2. The problem persists, but it may be related to the amount of freeMemory that tensorflow senses. So far every time the freeMemory is over 5GiB the GPU sync error occurs. It sometimes occurs when the freeMemory is in the 4GiB range, but sometimes other memory errors occur instead. I have not seen the error when the freeMemory is in the 3GiB range.
Well I have a solution. I am using the following code to reduce the amount of GPU memory that tensorflow grabs:
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.3
I can put the GPU memory fraction as high as 0.5, but that only works sometimes. 0.3 has yet to error on me.
I think the problem is that tensorflow by default wants to allocate all the GPU memory available. But in the case of the TX2 the GPU memory is also the CPU memory so maybe the OS or other processes won’t allow tensorflow to allocate the memory it wants. I guess really what I need to do is learn CUDA so I can take full advantage of the TX2!
Thanks for sharing your status with us.
This is a known issue for TensorFlow on Jetson.
It’s recommended to limit the query amount of TensorFlow via this configuration:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
Per Tensor RT documentation,
by default it will try to allocate all the available GPU memory.
The available memory for TX2 may be very high (up to 6.2 GB) sometime.
On iGPU environment, such a huge memory allocation will fail in general as host and GPU share the same memory.
The workaround restrict the amount of memory allocation.
You can check this topic for more information.