Why OpenCV thresholding function is slower in GPU than CPU?

Hi,

OpenCV - 3.3.1
Cuda toolkit - 8.0.84
Cudnn - 6.0

I’ve done a code on comparing the performance of OpenCV functions namely cvtcolor() and threshold() in CPU as well as in GPU.
It is observed that the cvtcolor() is faster in GPU. But cuda::threshold() is slower in GPU.
Why is the performance of some functions decreasing in GPU?

My code is as described below.

##################################### cvtcolor #########################

start_color = std::chrono::steady_clock::now();
for (int k = 0; k < 10000; k++)
{

	cv::cvtColor(im1, im1_, CV_BGR2GRAY, 1);
	cv::cvtColor(im2, im2_, CV_BGR2GRAY, 1);	
}

stop_color = std::chrono::steady_clock::now();
std::chrono::duration<float> time_interval = stop_color - start_color;
t = 1000*time_interval.count();
	std::cout << "run time for " << "cv::cvtColor" << " = " << (t/20000) << " milli sec" << std::endl; 

################################ cuda::cvtcolor #########################

start_colorcuda = std::chrono::steady_clock::now();
for (int k = 0; k < 10000; k++)
{
	
	cuda::cvtColor(im1_gpu, im1_gpu_, CV_BGR2GRAY, 1);
	cuda::cvtColor(im2_gpu, im2_gpu_, CV_BGR2GRAY, 1);
	
}

stop_colorcuda = std::chrono::steady_clock::now();
std::chrono::duration<float> time_interval_cu = stop_colorcuda - start_colorcuda;
t_cu = 1000*time_interval_cu.count();
	std::cout << "run time for " << "cuda::cvtColor" << " = " << (t_cu/20000) << " milli sec" << std::endl; 

####################################### threshold #########################

start_threshold = std::chrono::steady_clock::now();
for (int k = 0; k < 10000; k++)
{
	
	threshold(im1_, th1, 0, 1, THRESH_BINARY );
	threshold(im2_, th2, 0, 1, THRESH_BINARY);
	
	
}
stop_threshold = std::chrono::steady_clock::now();
std::chrono::duration<float> time_th = stop_threshold - start_threshold;
t_th = 1000*time_th.count();
	std::cout << "run time for " << "threshold" << " = " << (t_th/20000) << " milli sec" << std::endl; 

#################################### cuda::threshold #########################

start_threshcuda = std::chrono::steady_clock::now();
for (int k = 0; k < 10000; k++)
{
	
	cuda::threshold(im1_gpu_,th1_gpu,0, 1,  THRESH_BINARY);
	cuda::threshold(im2_gpu_,th2_gpu,0, 1, THRESH_BINARY);
	
}

stop_threshcuda = std::chrono::steady_clock::now();
std::chrono::duration<float> time_thcu = stop_threshcuda - start_threshcuda;
t_thcu = 1000*time_thcu.count();
	std::cout << "run time for " << "cuda::threshold" << " = " << (t_thcu/20000) << " milli sec" << std::endl; 

And the results are attached as shown below:

run time for cv::cvtColor = 0.143568 milli sec
run time for cuda::cvtColor = 0.097032 milli sec
run time for threshold = 0.0138205 milli sec
run time for cuda::threshold = 0.0794818 milli sec

This very specific question would be best answered by OpenCV developers, as they know the details of their implementation best.

There’s an opencv-users mailing list as a Yahoo! Groups (opencv@yahoogroups.com)

And sourceforge hosts the opencvlibrary-devel mailing list ( opencvlibrary-devel List Signup and Options )

Also, when unsure about aspects of speed, there’s the nVidia Visual Profiler (nvvp) as well as the nSight Visual C++ profiler (or the nSight Eclipse edition). These tools can tell you what’s going on on a microsecond timeline, and with a graphical user interface. Transporting data (especially image data) to the GPU and back is often a bottleneck.