Did someone has 2-D convolution cuda code?

Did someone has 2-D convolution cuda code? Neither the separable convolution nor the convolutionFFT2d.
just the naive straightforward 2-D convolution code in the space domain(not the frequency domain)

Thank you very much!! :rolleyes:

This is pretty old, untested and unoptimized. So use it as a starting point.

KERNEL_X/Y are the center points of the kernel.

I think i found it somewhere on this forum, in part anyway, so the credit doesnt all go to me!

–actually, seeing as how there are “double” in there, i probably found it somewhere else, watch out for the double precision there, might wanna get rid of it.

__global__ void convolve(float* result) 

{

	const int idx = (blockIdx.y*blockDim.x*gridDim.x)+blockIdx.x*blockDim.x+threadIdx.x;

	const int y = idx/DATA_W;

	const int x = idx-y*DATA_W;

	if(x<DATA_W&&y<DATA_H)

	{

		double sum = 0;

		for(int ky = -(KERNEL_H - KERNEL_Y - 1); ky <= KERNEL_Y; ky++)

		{

			for(int kx = -(KERNEL_W - KERNEL_X - 1); kx <= KERNEL_X; kx++)

			{

				int dx = x + kx;

				int dy = y + ky;

				if(dx < 0) dx = 0;

				if(dy < 0) dy = 0;

				if(dx >= DATA_W) dx = DATA_W - 1;

				if(dy >= DATA_H) dy = DATA_H - 1;

				float filterVal = d_Kernel[(KERNEL_Y - ky) * KERNEL_W + (KERNEL_X - kx)];

				float pixelVal = tex2D(texData,(float)dx + 0.5f ,(float)dy + 0.5f);

				sum += filterVal * pixelVal;

			}

		}

		result[y * DATA_W + x] = (float)sum;

	}

	

}

thank you very much