CUDA Fail when running Tensorflow inference

When running inference with Tensorflow I get this error. Sometime I get out one inference then I get this error for the second inference.

2018-01-27 10:45:22.901361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-01-27 10:45:22.901502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.69GiB
2018-01-27 10:45:22.901554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-01-27 10:45:22.901575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-01-27 10:45:22.901600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2018-01-27 10:45:39.212173: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2018-01-27 10:45:39.212275: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x6516170: CUDA_ERROR_LAUNCH_FAILED
2018-01-27 10:45:39.212312: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x6516170: CUDA_ERROR_LAUNCH_FAILED
2018-01-27 10:45:39.212479: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED

Appreciate your thoughts on this!

Hi,

Usually, CUDA launch failed is caused by incompatible CUDA library/driver.
Could you share more information about your environment?
1. Which JetPack version do you use?
2. How do you install TensorFlow? Do you build it from source or install a public wheel?

Thanks.

Hi AstaLLL,

Thanks for your reply. I’m running on python 3.5 and Tensorflow 1.3

Installed JetPack 3.1, and build the Tensorflow from the source, based on this github repo:

I am able to run simpler and smaller Tensorflow networks on TX2, but for my larger sized network, I’m getting above error.

Hi,

From your log, error is from function cuEventSynchronize().
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/stream_executor/cuda/cuda_driver.cc#L1080

Could you help to collect cuda-memcheck log and share with us?

cuda-memcheck python [source].py

Thanks.

Hi,

I am having the same problem. I tried these combinations:

All of the combinations give the same problem. I have a script that does the inference on some images. I am able to run the script once and then when I restart it I get the error. Even if I reboot it doesn’t solve the problem.
However, I noticed that if I remove libcudnn6 with apt and install it again, the script runs once and when restarted the error occurs again.
Is it something to do with cuDNN?
Is it a good idea to try to compile TF with cuDNN 7?

Thanks.

cuda-command-line-tools-8-0                            8.0.84-1                
cuda-core-8-0                                          8.0.84-1        
cuda-cublas-8-0                                        8.0.84-1                        
cuda-cublas-dev-8-0                                    8.0.84-1                         
cuda-cudart-8-0                                        8.0.84-1                      
cuda-cudart-dev-8-0                                    8.0.84-1                               
cuda-cufft-8-0                                         8.0.84-1                       
cuda-cufft-dev-8-0                                     8.0.84-1            
cuda-curand-8-0                                        8.0.84-1                        
cuda-curand-dev-8-0                                    8.0.84-1                         
cuda-cusolver-8-0                                      8.0.84-1                             
cuda-cusolver-dev-8-0                                  8.0.84-1                              
cuda-cusparse-8-0                                      8.0.84-1                          
cuda-cusparse-dev-8-0                                  8.0.84-1                           
cuda-documentation-8-0                                 8.0.84-1           
cuda-driver-dev-8-0                                    8.0.84-1                            
cuda-license-8-0                                       8.0.84-1      
cuda-misc-headers-8-0                                  8.0.84-1                   
cuda-npp-8-0                                           8.0.84-1                     
cuda-npp-dev-8-0                                       8.0.84-1                      
cuda-nvgraph-8-0                                       8.0.84-1                         
cuda-nvgraph-dev-8-0                                   8.0.84-1                          
cuda-nvml-dev-8-0                                      8.0.84-1                       
cuda-nvrtc-8-0                                         8.0.84-1                       
cuda-nvrtc-dev-8-0                                     8.0.84-1                        
cuda-repo-l4t-8-0-local                                8.0.84-1                            
cuda-samples-8-0                                       8.0.84-1                  
cuda-toolkit-8-0                                       8.0.84-1                      
libcudnn6                                              6.0.21-1+cuda8.0        
nv-gie-repo-ubuntu1604-ga-cuda8.0-trt2.1-20170614      1-1                                   
nv-tensorrt-repo-ubuntu1604-rc-cuda8.0-trt3.0-20170922 3.0.0-1

It seems the process crashes. Here is what I get:

2018-01-30 09:13:08.565407: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-01-30 09:13:08.565560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.22GiB
2018-01-30 09:13:08.565617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2018-01-30 09:13:08.565645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2018-01-30 09:13:08.565676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2018-01-30 09:16:20.210420: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x6811920: CUDA_ERROR_LAUNCH_FAILED
2018-01-30 09:16:20.210558: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x6811920: CUDA_ERROR_LAUNCH_FAILED
2018-01-30 09:16:20.211604: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
========= Error: process didn't terminate successfully
========= Internal error (20)
========= No CUDA-MEMCHECK results found

Hi pvaezi,

I compiled TF 1.5 for python 2 against cuda 8 and cudnn 7.
I also have an 8GB swap file enabled.
It seems that I get the error only the first time I run the inference. After that all runs are successful. The same happens even after a reboot.

You can find the wheel here if you want to try: https://drive.google.com/open?id=1gV6W7J47P9UjixoKHQ-xZPLR9k1pQNKX

Best

Hi, both

Thanks for the update. It looks like the error comes from the TF application.
Although there is a workaround shared by am2266, it’s still recommended to file an issue to TensorFlower.

Thanks.

I have just one more doubt. I never managed to install the JetPack from an host using the installer so I installed the components manually. For cudnn I install it with apt install libcudnn7-dev, is this ok?

Thanks

Thanks am2266, will try your solution.

AastaLLL, will share this problem with TF guys.

Hi am226,

Please install Jetson package from JetPack.
Usually, Package downloaded from our website is for desktop GPU.

Thanks.