[Problem]
I cannot get better performance with OpenCV GPU-accelerated API than OpenCV normal API.
For example, regarding detecting corner by using FAST algorithm, GPU-accelerated API takes
around 10 times slower than normal API.
Of course, I skipped 1st GPU-accelerated API call because it takes very long time.
Is there any mistakes on my measurement ?
Could you please help me ?
[Condition]
Target board : nVIDIA JETSON TX1
JetPack : 2.2.1 for L4T
OS : Ubuntu 14.04 LTS
OpenCV : libopencv4tegra-repo_2.4.13_arm64_l4t-r24.deb
Input : logo.png http://opencv.org/wp-content/themes/opencv/images/logo.png
[Measurement steps by step]
(1) execute script in order to setup CPU and GPU clock maximum.
https://devtalk.nvidia.com/default/topic/901337/jetson-tx1/cuda-7-0-jetson-tx1-performance-and-benchmarks/post/4747186/#4747186
(2) execute Test program.
[Test source code]
#include <stdio.h>
#include <opencv2/opencv.hpp>
#include <opencv2/gpu/gpu.hpp>
using namespace std;
#define TEST_LOOP (1000)
int main (int argc, char* argv[])
{
int gpu_count = cv::gpu::getCudaEnabledDeviceCount();
if (0 < gpu_count)
{
// preparation
cv::Mat src_host = cv::imread("logo.png");
cv::Mat gray_host;
cv::cvtColor(src_host, gray_host, CV_RGB2GRAY);
// skip 1st call gpu api because 1st call might consume lots of time
cv::gpu::GpuMat src, dst;
cv::gpu::FAST_GPU fastGpu(20, true);
cv::gpu::GpuMat keypoints_gpu;
src.upload(gray_host);
fastGpu(src, cv::gpu::GpuMat(), keypoints_gpu);
// measure cv::FAST start
clock_t cpu_time_used;
cpu_time_used = clock();
vector<cv::KeyPoint> keypoints;
for (int j = 0; j < TEST_LOOP; j++)
{
cv::FAST(gray_host, keypoints, 20, true);
}
// measure cv::FAST end
cpu_time_used = clock() - cpu_time_used;
std::cout << "cv::FAST : " << ((double) cpu_time_used) / CLOCKS_PER_SEC << " sec" << endl;
// measure cv::gpu::FAST_GPU start
cpu_time_used = clock();
for (int i = 0; i < TEST_LOOP; i++)
{
src.upload(gray_host);
fastGpu(src, cv::gpu::GpuMat(), keypoints_gpu);
}
// measure cv::gpu::FAST_GPU end
cpu_time_used = clock() - cpu_time_used;
std::cout << "cv::gpu::FAST_GPU : " << ((double) cpu_time_used) / CLOCKS_PER_SEC << " sec" << endl;
}
else
{
std::cout << "no gpu" << endl;
}
return 0;
}
[Result of Test program]
ubuntu@tegra-ubuntu:~/hoge/sample_opencv_app/bin$ ./sample
cv::FAST : 0.138623 sec
cv::gpu::FAST_GPU : 1.36099 sec
jachen
October 17, 2016, 9:53am
2
Hello,
Would you please test this code by a large image?
I tried a 16M pixel image, and GPU costs only 1/4 time of CPU.
GPU processing have more extra overhead, and it’s better to deal with big data.
br
Chenjian
Hi Chenjian-san,
Thank you for your reply.
Your information is very useful for me.
I confirmed that GPU costs less than CPU in case of using big data.
[Input]
https://upload.wikimedia.org/wikipedia/commons/4/45/Cliparts_%28examples%29.png
[Result]
ubuntu@tegra-ubuntu:~/hoge/sample_opencv_app/bin$ ./sample
cv::FAST : 25.5326 sec
cv::gpu::FAST_GPU : 3.62653 sec
ubuntu@tegra-ubuntu:~/hoge/sample_opencv_app/bin$
Thanks a lot,
makotoqnb
hi,
I tried your code in my computer. When I build this project in Nsight, it said that:
make all -C /home/liang/cuda_workspace/fbflow_gpu/Debug
make: Entering directory `/home/liang/cuda_workspace/fbflow_gpu/Debug'
Building target: fbflow_gpu
Invoking: NVCC Linker
/usr/local/cuda-6.5/bin/nvcc --cudart static -L/opt/opencv/2.4.9/armv7l/lib -ccbin /usr/bin/arm-linux-gnueabihf-g++-4.8 --relocatable-device-code=false -gencode arch=compute_20,code=compute_20 -gencode arch=compute_20,code=sm_20 --target-cpu-architecture ARM -m32 -link -o "fbflow_gpu" ./info.o ./main.o ./test.o ./source/rgb2gray/rgb2gray.o ./source/rgb2gray/rgb2gray_caller.o ./source/mean/mean_n.o ./source/Negate/negate.o ./source/FSAT_CPU/fast.o ./source/FSAT_CPU/fast_9.o ./source/FSAT_CPU/nonmaxt.o ./source/FAST/FAST.o ./source/FAST/FAST_9_caller.o -lopencv_highgui -lopencv_features2d -lopencv_core -lopencv_imgproc
./main.o: In function `main':
/home/liang/cuda_workspace/fbflow_gpu/Debug/../main.cpp:122: undefined reference to `cv::gpu::FAST_GPU::FAST_GPU(int, bool, double)'
make: Leaving directory `/home/liang/cuda_workspace/fbflow_gpu/Debug'
/home/liang/cuda_workspace/fbflow_gpu/Debug/../main.cpp:125: undefined reference to `cv::gpu::FAST_GPU::operator()(cv::gpu::GpuMat const&, cv::gpu::GpuMat const&, cv::gpu::GpuMat&)'
/home/liang/cuda_workspace/fbflow_gpu/Debug/../main.cpp:140: undefined reference to `cv::gpu::FAST_GPU::operator()(cv::gpu::GpuMat const&, cv::gpu::GpuMat const&, cv::gpu::GpuMat&)'
collect2: error: ld returned 1 exit status
make: *** [fbflow_gpu] Error 1
> Shell Completed (exit code = 2)
did you meet this problem?
liang
hi,
I tried your code in my computer. When I build this project in Nsight, it said that:
make all -C /home/liang/cuda_workspace/fbflow_gpu/Debug
make: Entering directory `/home/liang/cuda_workspace/fbflow_gpu/Debug'
Building target: fbflow_gpu
Invoking: NVCC Linker
/usr/local/cuda-6.5/bin/nvcc --cudart static -L/opt/opencv/2.4.9/armv7l/lib -ccbin /usr/bin/arm-linux-gnueabihf-g++-4.8 --relocatable-device-code=false -gencode arch=compute_20,code=compute_20 -gencode arch=compute_20,code=sm_20 --target-cpu-architecture ARM -m32 -link -o "fbflow_gpu" ./info.o ./main.o ./test.o ./source/rgb2gray/rgb2gray.o ./source/rgb2gray/rgb2gray_caller.o ./source/mean/mean_n.o ./source/Negate/negate.o ./source/FSAT_CPU/fast.o ./source/FSAT_CPU/fast_9.o ./source/FSAT_CPU/nonmaxt.o ./source/FAST/FAST.o ./source/FAST/FAST_9_caller.o -lopencv_highgui -lopencv_features2d -lopencv_core -lopencv_imgproc
./main.o: In function `main':
/home/liang/cuda_workspace/fbflow_gpu/Debug/../main.cpp:122: undefined reference to `cv::gpu::FAST_GPU::FAST_GPU(int, bool, double)'
make: Leaving directory `/home/liang/cuda_workspace/fbflow_gpu/Debug'
/home/liang/cuda_workspace/fbflow_gpu/Debug/../main.cpp:125: undefined reference to `cv::gpu::FAST_GPU::operator()(cv::gpu::GpuMat const&, cv::gpu::GpuMat const&, cv::gpu::GpuMat&)'
/home/liang/cuda_workspace/fbflow_gpu/Debug/../main.cpp:140: undefined reference to `cv::gpu::FAST_GPU::operator()(cv::gpu::GpuMat const&, cv::gpu::GpuMat const&, cv::gpu::GpuMat&)'
collect2: error: ld returned 1 exit status
make: *** [fbflow_gpu] Error 1
> Shell Completed (exit code = 2)
did you meet this problem?
liang
ops, i have solve this problem. I add opencv_gpu to the library, and it worked!