When running inference with Tensorflow I get this error. Sometime I get out one inference then I get this error for the second inference.
2018-01-27 10:45:22.901361: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-01-27 10:45:22.901502: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.69GiB
2018-01-27 10:45:22.901554: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2018-01-27 10:45:22.901575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2018-01-27 10:45:22.901600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2018-01-27 10:45:39.212173: E tensorflow/stream_executor/cuda/cuda_driver.cc:1068] failed to synchronize the stop event: CUDA_ERROR_LAUNCH_FAILED
2018-01-27 10:45:39.212275: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x6516170: CUDA_ERROR_LAUNCH_FAILED
2018-01-27 10:45:39.212312: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x6516170: CUDA_ERROR_LAUNCH_FAILED
2018-01-27 10:45:39.212479: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
Usually, CUDA launch failed is caused by incompatible CUDA library/driver.
Could you share more information about your environment? 1. Which JetPack version do you use? 2. How do you install TensorFlow? Do you build it from source or install a public wheel?
All of the combinations give the same problem. I have a script that does the inference on some images. I am able to run the script once and then when I restart it I get the error. Even if I reboot it doesn’t solve the problem.
However, I noticed that if I remove libcudnn6 with apt and install it again, the script runs once and when restarted the error occurs again.
Is it something to do with cuDNN?
Is it a good idea to try to compile TF with cuDNN 7?
2018-01-30 09:13:08.565407: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-01-30 09:13:08.565560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.22GiB
2018-01-30 09:13:08.565617: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2018-01-30 09:13:08.565645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2018-01-30 09:13:08.565676: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
2018-01-30 09:16:20.210420: E tensorflow/stream_executor/cuda/cuda_timer.cc:54] Internal: error destroying CUDA event in context 0x6811920: CUDA_ERROR_LAUNCH_FAILED
2018-01-30 09:16:20.210558: E tensorflow/stream_executor/cuda/cuda_timer.cc:59] Internal: error destroying CUDA event in context 0x6811920: CUDA_ERROR_LAUNCH_FAILED
2018-01-30 09:16:20.211604: F tensorflow/stream_executor/cuda/cuda_dnn.cc:2045] failed to enqueue convolution on stream: CUDNN_STATUS_EXECUTION_FAILED
========= Error: process didn't terminate successfully
========= Internal error (20)
========= No CUDA-MEMCHECK results found
I compiled TF 1.5 for python 2 against cuda 8 and cudnn 7.
I also have an 8GB swap file enabled.
It seems that I get the error only the first time I run the inference. After that all runs are successful. The same happens even after a reboot.
Thanks for the update. It looks like the error comes from the TF application.
Although there is a workaround shared by am2266, it’s still recommended to file an issue to TensorFlower.
I have just one more doubt. I never managed to install the JetPack from an host using the installer so I installed the components manually. For cudnn I install it with apt install libcudnn7-dev, is this ok?