I copied the code from github, and ran the sample with CPU and GPU. Its results really shocked me.
With CPU, it detected in 1.2s. With GPU, it detected in 7s.
The code is from yolo_tensorflow [url]https://github.com/hizhangp/yolo_tensorflow[/url]
My environment:
Jetson TX2
Tensorflow 1.3
Cuda compilation tools, release 8.0, V8.0.72
python 3.5.2
The ouput with GPU:
nvidia@tegra-ubuntu:/media/nvidia/YYFSD/DL/YOLO$ python test.py
2018-04-27 15:43:36.106786: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:857] ARM64 does not support NUMA - returning NUMA node zero
2018-04-27 15:43:36.106972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 3.76GiB
2018-04-27 15:43:36.107041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2018-04-27 15:43:36.107071: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y
2018-04-27 15:43:36.107138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) → (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0)
Restoring weights from: data/weights/YOLO_small.ckpt
Average detecting time: 7.569s
The output with CPU:
Restoring weights from: data/weights/YOLO_small.ckpt
Average detecting time: 1.196s
I would really appreciate it if someone can help me out.
Sincerely.