TensorFlow 1.5 on TX2 Errors

RyanSummers · February 13, 2018, 11:22pm

After initially following https://developer.nvidia.com/embedded/linux-tegra to install the R28.2 version of Linux for Tegra, I manually grabbed the debians from the Jetpack 3.2 developer preview (I couldn’t get Jetpack to work) and installed CuDNN 7 and Cuda 9.0 (9.0.252) onto the Jetson TX2.

I then used the Python wheel provided here (GitHub - peterlee0127/tensorflow-nvJetson: TensorFlow for NVIDIA Jetson, also include patch and script for building.) to install TensorfFlow 1.5 on the Jetson and was successful in starting up basic TensorFlow sessions.

However, I am testing out the implementation of a deep neural network for object detection. I have successfully tested the network on a desktop station running Cuda (9.0.176) and CuDNN 7. When I implemented the same network on the Jetson, it worked periodically. However, Tensorflow will sporadically begin throwing error when an inference is performed on an image.

2018-02-13 22:45:04.701424: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:881] could not open file to read NUMA node: /sys/bus/pci/devices/0000:00:00.0/numa_node
Your kernel may have been built without NUMA support.
2018-02-13 22:45:04.701537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.66GiB
2018-02-13 22:45:04.701583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (de
vice: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2018-02-13 22:45:05.898156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:859] Could not identify NUMA node of /job:localhost/rep
lica:0/task:0/device:GPU:0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2018-02-13 22:45:12.327744: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:650] failed to record completion event; therefore, failed to create inter-stream dependency
2018-02-13 22:45:12.327835: E tensorflow/stream_executor/event.cc:40] could not create CUDA event: CUDA_ERROR_UNKNOWN
Segmentation fault (core dumped)

The first error relating to the NUMA node occurs every time running and appears to have little effect on the program. However, when the second error occurs, it catastrophically kills the program. I only have been successful in getting the program to execute again after rebooting the Jetson a number of times.

Any insight that could be provided into the cause of this error would be greatly appreciated. I do not believe that this is a problem with TensorFlow, as this program works successfully on both my laptop (without GPU support) and on a desktop NVidia machine using a GTX 1080 Ti. However, if it appears to be so, I will bring my concerns over to the TensorFlow Github issues instead.

RyanSummers · February 14, 2018, 5:23pm

As an update, I was able to get image inference running on TensorFlow 1.5 by using the wheel (GitHub - peterlee0127/tensorflow-nvJetson: TensorFlow for NVIDIA Jetson, also include patch and script for building.) provided in this post (Available: TensorFlow 1.5 for Jetson TX2 - Jetson TX2 - NVIDIA Developer Forums). Note that the author of that wheel has recompiled TensorFlow 1.5 utilizing Cuda 8 and CuDNN 6, not Cuda 9 and CuDNN 7 as mentioned in the GitHub page.

It appears that the error is within the Cuda 9 or CuDNN 7 libraries, as I have continued to utilize L4T R28.2 with the downgraded Cuda libraries with success. Hopefully this helps someone else get TensorFlow 1.5 inferences running on the Jetson!

RyanSummers · February 15, 2018, 4:54am

After running the above-mentioned solution for a day, the original errors popped up again. It appears that an internal CUDA call in TensorFlow is returning an error. cuEventCreate() is returning CUDA_ERROR_UNKNOWN. Is this an issue with the drivers available for the Jetson TX2?

RyanSummers · February 17, 2018, 1:26am

After some discussion on the Github TensorFlow models issues list, it was discovered that these errors are likely due to the Jetson TX2 running out of memory.

Please see object_detection: Trained SSD-Inception-v2 Inference Errors on Jetson TX2 · Issue #3390 · tensorflow/models · GitHub for more information. By limiting the amount of memory available to the GPU in a tensorflow session, CUDA errors can be avoided by ensuring that the Linux system and the CUDA system both have the memory that they require. There are code samples at the provided link describing how to do this in TensorFlow.

AastaLLL · February 23, 2018, 2:39am

Hi,

Thanks to keep updating information with us.

Here is a TF-1.5 with CUDA 9.0 package for your reference:

Thanks.

arafatsharif2 · July 26, 2019, 1:32pm

thanks for the updates

Topic		Replies	Views
CUDA Fail when running Tensorflow inference Jetson TX2	10	3363	February 2, 2018
Could not allocate memory: Tensorflow 1.5 on python 3 for Jetson TX2 Jetson TX2	4	1949	October 18, 2021
run tensorflow 1.3 on tx2 stuck Jetson TX2	20	5624	October 18, 2021
Your kernel may have been built without NUMA support. General	0	4342	August 7, 2018
NUMA Error running Tensorflow on Jetson Tx2 Jetson TX2	5	22075	October 11, 2023
Tensorflow Memory Error Jetson TX2	25	15333	October 18, 2021
trouble with Tensorflow and TX2. Jetson TX2	1	1916	March 1, 2018
fail to run tensorflow1.5 in tx2 Jetson TX2	3	699	February 12, 2018
Trying to execute tensorflow with GPU support on my Jetson TX2, but having error. Jetson TX2	2	1093	October 18, 2021
Odd behavior with Jetpack 3.2 and tensorflow Jetson TX2	4	1049	October 18, 2021

TensorFlow 1.5 on TX2 Errors

Related topics