run tensorflow 1.3 on tx2 stuck

shartoo518 · December 12, 2017, 6:12am

Hi,i’m running tensorflow with python3.5 on tx2 but this seems unstable.It runs normally only first time i launched python script,but the else i got message like below and stuck.

2017-12-12 06:02:47.064075: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-12 06:02:47.064203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.18GiB
2017-12-12 06:02:47.064255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-12-12 06:02:47.064279: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-12-12 06:02:47.064310: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-12-12 06:04:09.279612: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-12 06:04:09.279745: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 4.33GiB
2017-12-12 06:04:09.279795: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-12-12 06:04:09.279830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-12-12 06:04:09.279868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)

shartoo518 · December 12, 2017, 6:14am

The information about gpu shows twice, it should show only once if ran normally .

shartoo518 · December 12, 2017, 6:24am

I reboot tx2 just now and got error message like this:

2017-12-12 06:21:32.375742: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-12 06:21:32.375870: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.09GiB
2017-12-12 06:21:32.375923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-12-12 06:21:32.376007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-12-12 06:21:32.376039: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-12-12 06:22:14.858684: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2017-12-12 06:22:14.858769: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0xaedda10: CUDA_ERROR_LAUNCH_FAILED
2017-12-12 06:22:14.858799: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0xaedda10: CUDA_ERROR_LAUNCH_FAILED
2017-12-12 06:22:14.858956: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
2017-12-12 06:23:02.713872: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-12 06:23:02.713999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.72GiB
2017-12-12 06:23:02.714054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-12-12 06:23:02.714079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-12-12 06:23:02.714105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)

AastaLLL · December 12, 2017, 9:01am

Hi,

Which TensorFlow build do you use?

Usually, we use this public build:

We can launch TensorFlow correctly with JetPack3.1.
Could you also give it a try?

Thanks.

shartoo518 · December 12, 2017, 9:08am

I build my tensorflow according to https://syed-ahmed.gitbooks.io/nvidia-jetson-tx2-recipes/content/first-question.html and it runs ok.

python3
Python 3.5.2 (default,Nov 23 2017,16:37:01)
[GCC 5.4.0 20160609] on linux
Type "Help","copyright" for more information

>>> import tensorflow as tf
>>> print(tf.__version__)
1.3.0
>>>

shartoo518 · December 12, 2017, 10:24am

Hi there my problem remains even Tensorflow seems work fine.Can you help me?

AastaLLL · December 13, 2017, 5:55am

Hi,

Could you try Tensorflow 1.3.0 or the wheel shared in comment #4?
Based on this issue, the CUDA_ERROR_LAUNCH_FAILED error is gone after upgrading environment to TensorFlow 1.3.0 and cuDNN v6.

Thanks.

garrett.floft · December 13, 2017, 5:57pm

The build in GitHub - peterlee0127/tensorflow-nvJetson: TensorFlow for NVIDIA Jetson, also include patch and script for building. says it is only for Python 2.7 not Python 3.5. I’ve built TensorFlow 1.3.0 from source and used the Python 3.5 build at GitHub - jetsonhacks/installTensorFlowJetsonTX: Install TensorFlow on the NVIDIA Jetson TX1 or TX2 from the provided wheel files both of which give the same errors as above.

Note: for me it does appear to work with Python 2.7 using the build from GitHub - peterlee0127/tensorflow-nvJetson: TensorFlow for NVIDIA Jetson, also include patch and script for building. just not sure why it doesn’t work with Python 3.5.

shartoo518 · December 14, 2017, 2:43am

Here are problems occured by other users

[url]https://dev-videos.com/videos/V51IO7kNXCg/TensorFlow-Install-on-NVIDIA-Jetson-TX2[/url]

[url]https://github.com/tensorflow/tensorflow/issues/15075[/url]

Someone suggest reduce batch size ,but my script is inference not training.[url]cudnn - Tensorflow CUDA fails with error "failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED" - Stack Overflow

AastaLLL · December 14, 2017, 7:32am

Hi,

CUDA_ERROR_LAUNCH_FAILED usually comes from incorrect CUDA version/driver or GPU architecture.

Here is another public TensorFlow build for Python 3.5:

Could you reflash TX2 with JetPack3.1 and give this wheel a try?
Thanks.

shartoo518 · December 15, 2017, 2:39am

Yes,i do flashed my tx2 with JetPack3.1 and i just uninstall & install the tensorflow as you recommend ,but error remain the same.Thank you for your help.

AastaLLL · December 15, 2017, 3:03am

Hi,

Thanks for your feedback.
We will check this issue and reply information to you later.

AastaLLL · December 15, 2017, 3:33am

Hi,

We can run TensorFlow correctly with python 3.5:

nvidia@tegra-ubuntu:/media/nvidia/NVIDIA$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2017-12-15 03:22:31.509179: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-15 03:22:31.509304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 369.01MiB
2017-12-15 03:22:31.509358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-12-15 03:22:31.509383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-12-15 03:22:31.509406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'

Here are our steps:
1. Flash TX2 with JetPack3.1
2. Upgrade cuDNNv7 via this package
3. Install TensorFlow

$ sudo apt-get install -y python3-pip python3-dev
$ pip3 install tensorflow-1.3.0-cp35-cp35m-linux_aarch64.whl

Could you follow our steps and check if the issue remains?
If yes, please help to test a CUDA sample for GPU functionality.

$ /usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh .
$ cd NVIDIA_CUDA-8.0_Samples/0_Simple/vectorAdd
$ make && ./vectorAdd

Thanks, and please let us know the results.

shartoo518 · December 15, 2017, 5:33am

Updating cuda by tar file seems not work,the output of test is:

sudo dpkg -l | grep TensorRT
[sudo] password for nvidia: 
ii  libnvinfer-dev                        3.0.2-1+cuda8.0      arm64        TensorRT development libraries and headers
ii  libnvinfer3                           3.0.2-1+cuda8.0        arm64        TensorRT runtime libraries
ii  tensorrt-2.1.2                        3.0.2-1+cuda8.0        arm64        Meta package of TensorRT

While by installing deb file can works better ?

sudo dpkg -l | grep TensorRT

ii  libnvinfer-dev                                              4.0.0-1+cuda8.0                                       arm64        TensorRT development libraries and headers
ii  libnvinfer-samples                                          4.0.0-1+cuda8.0                                       arm64        TensorRT samples and documentation
ii  libnvinfer3                                                 3.0.2-1+cuda8.0                                       arm64        TensorRT runtime libraries
ii  libnvinfer4                                                 4.0.0-1+cuda8.0                                       arm64        TensorRT runtime libraries
ii  tensorrt                                                    3.0.0-1+cuda8.0                                       arm64        Meta package of TensorRT
ii  tensorrt-2.1.2                                              3.0.2-1+cuda8.0                                       arm64        Meta package of TensorRT

Coping test files

/usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh .
Copying samples to ./NVIDIA_CUDA-8.0_Samples now...
Finished copying samples.

Running test code:

make && ./vectorAdd
/usr/local/cuda-8.0/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_62,code=compute_62 -o vectorAdd.o -c vectorAdd.cu
/usr/local/cuda-8.0/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_62,code=compute_62 -o vectorAdd vectorAdd.o 
mkdir -p ../../bin/aarch64/linux/release
cp vectorAdd ../../bin/aarch64/linux/release
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

shartoo518 · December 15, 2017, 6:02am

The Tensorflow test script can run well, but you may notice some information like “Total memory: 7.67GiB
Free memory: 369.01MiB” .I ran my inference script and problem remain

2017-12-15 05:55:47.193361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-15 05:55:47.193494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.67GiB
2017-12-15 05:55:47.193548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-12-15 05:55:47.193576: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-12-15 05:55:47.193603: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2017-12-15 05:57:12.935098: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2017-12-15 05:57:12.935343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.82GiB
2017-12-15 05:57:12.935439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-12-15 05:57:12.935483: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-12-15 05:57:12.935531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)

I’m testing faster-rcnn(resnet v2) inference script, this script can be ran well by a small pc box(intel i5,4GB RAM,no GPU) in 44 secs per image (600x800).

shartoo518 · December 15, 2017, 6:55am

I check my scipt running state,every time i launched my script message_server.py, there are two thread ran like:

ps -aux | grep python

nvidia   2945   39.6  2.0 1800792   165368   pts/7  Sl+  06:23   0:07    python3   message_server.py
nvidia   3021   95.5  8.2 1713920   662340   pts/7  R+   06:23   0:13    python3   message_server.py
nvidia   3034   0.0   0.0 5560      604      pts/2  S+   06:23   0:00    grep     -color=auto message_server.py

I test another script test_tensorflow.py whose content is :

import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

and there will only one thread appear.

So the problem may be caused by competition usage of GPU by two thread? But how did this happen?

garrett.floft · December 18, 2017, 8:53pm

After upgrading to cuDNNv7, it works with Python 3.5 for me. Thanks AastaLLL.

shartoo518 · December 19, 2017, 6:27am

Sorry,my problem remains,but updating cuDNNv7 works according to reply from garrett.floft.i’ll close this topic.

jetboyYerong · December 30, 2017, 6:14pm

Actually I met a situation where the tensorflow on TX1 stucks and it runs very slow with both python3.5 and python 2.7. My TX1 has R28.1.
Does anyone know how to update to cudnn v7?

jetboyYerong · December 30, 2017, 6:16pm

I fixed my tensorflow stuck with rebuilding the Terga 28.1 kernel and creating a swap file.

And the NUMA warning does not really affect things.

Topic		Replies	Views
Tensorflow Memory Error Jetson TX2	25	15440	October 18, 2021
trouble with Tensorflow and TX2. Jetson TX2	1	1931	March 1, 2018
CUDA Fail when running Tensorflow inference Jetson TX2	10	3425	February 2, 2018
Tensorflow error in NVIDIA TX1 Jetson TX1	7	1949	December 30, 2017
I run tensorflow, but tensorflow would not run. the cmd block Jetson TX2	2	496	October 18, 2021
CUDA_ERROR_LAUNCH_FAILED error when running TensorFlow mnist example Jetson TX2	4	2928	December 7, 2017
TensorFlow on Jetson TX2 Jetson TX2	47	19684	September 18, 2017
Trying to execute tensorflow with GPU support on my Jetson TX2, but having error. Jetson TX2	2	1118	October 18, 2021
TensorFlow 1.5 on TX2 Errors Jetson TX2	6	2731	October 18, 2021
failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED Jetson TX2	10	1325	March 1, 2018

run tensorflow 1.3 on tx2 stuck

Related topics