Gaussian filtering computed by cpu through opencv is much faster than Gaussian filtering through CUDA on jetson nano

zhgw90 · December 4, 2024, 10:52am

Dear Community,
I calculate the Gaussian filter on the jetson nano,the image size is 1920*1200 , and i use opencv firstly,it comput by cpu. then i use cuda, but CUDA takes more time then opencv-cpu. i can’t understand,why cpu is faster then gpu。

Here is the opencv-cpu code:

cv::Mat smooth_mat(d_image_height_, d_image_width_, CV_8UC1, pattern_ptr);
cv::GaussianBlur(smooth_mat, smooth_mat, cv::Size(5, 5), 1, 1);

Here is the CUDA code:

void main(void )
{
dim3 threadsPerBlock(8, 8);
dim3 blocksPerGrid((d_image_width_ + threadsPerBlock.x - 1) / threadsPerBlock.x,(d_image_height_ + threadsPerBlock.y - 1) / threadsPerBlock.y);

cudaMallocManaged((void **)&d_patterns_list_[i], d_image_height_ * d_image_width_ * sizeof(unsigned char));
// Some fill data code is omitted here...
kernel_gaussian_blur <<< blocksPerGrid, threadsPerBlock >>> (d_patterns_list_[serial_flag], d_patterns_list_[serial_flag], d_image_height_, d_image_width_, gauss_filter_width);
cudaDeviceSynchronize();
}


__global__ void kernel_gaussian_blur(const uchar* src, uchar* dst, int height, int width, int filterWidth)
{

	int y = blockDim.y * blockIdx.y + threadIdx.y; //X
	int x = blockDim.x * blockIdx.x + threadIdx.x; //y
	int ind = y * width + x;
	if (y >= height || x >= width)
	{
		return;
	}

	float color = 0.0f;

	for (int i = 0; i < filterWidth; i++)
	{
		for (int j = 0; j < filterWidth; j++)
		{
			int clamp_x = min(max(x + j - filterWidth / 2, 0), width - 1);
			int clamp_y = min(max(y + i - filterWidth / 2, 0), height - 1);
			// float avg = d_const_Gaussian_5_5[i * filterWidth + j];
			color += (d_const_Gaussian_5_5[i * filterWidth + j] * static_cast<float>(src[clamp_y * width + clamp_x]));
		}
	}

	dst[ind] = color;
}

TomNVIDIA · December 4, 2024, 3:17pm

Hello,

Thanks for visiting the NVIDIA Developer forums! Your topic will be best served in the Jetson category.

I will move this post over for visibility.

Cheers,
Tom

zhgw90 · December 5, 2024, 1:52am

thank you !

AastaLLL · December 5, 2024, 7:24am

Hi,

Have you tried our VPI library which also has the Gaussian filter?
You can find the expected performance in the below table.
https://docs.nvidia.com/vpi/1.2/algo_gaussian_filter.html

Since OpenCV is a third-party library, please contact the OpenCV team if you prefer to use their implementation.

Thanks.

system · January 1, 2025, 1:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA is so slow Jetson Nano opencv	5	1356	June 30, 2022
Speed comparison between CUDA and OpenCV CUDA Programming and Performance opencv	2	3824	June 6, 2017
Image filtering CUDA python Jetson Nano graphics	2	856	June 8, 2022
OpenCV CUDA Canny is slower than cv::Canny ? Jetson Nano opencv	4	2733	July 2, 2019
Unexpected opencv performance on TK1, and CUDA crash Jetson TK1	4	1193	October 18, 2021
Opencv cuda convolution extremly slower than bare cuda code convolution on Jetson Nano using unified memory Jetson Nano opencv	12	3797	October 18, 2021
A strange timing program, call for help! CUDA Programming and Performance	0	3802	March 22, 2009
[Performance] I cannot get better performance with OpenCV GPU-accelerated API. Jetson TX1	5	4048	October 18, 2021
The best algorithm of Gaussian fliter in Guda CUDA Programming and Performance	11	8778	November 3, 2009
A strange timing program, call for help! CUDA Programming and Performance	1	1199	March 25, 2009

Gaussian filtering computed by cpu through opencv is much faster than Gaussian filtering through CUDA on jetson nano

Related topics