Did someone has 2-D convolution cuda code? Neither the separable convolution nor the convolutionFFT2d.
just the naive straightforward 2-D convolution code in the space domain(not the frequency domain)
Thank you very much!! :rolleyes:
Did someone has 2-D convolution cuda code? Neither the separable convolution nor the convolutionFFT2d.
just the naive straightforward 2-D convolution code in the space domain(not the frequency domain)
Thank you very much!! :rolleyes:
This is pretty old, untested and unoptimized. So use it as a starting point.
KERNEL_X/Y are the center points of the kernel.
I think i found it somewhere on this forum, in part anyway, so the credit doesnt all go to me!
–actually, seeing as how there are “double” in there, i probably found it somewhere else, watch out for the double precision there, might wanna get rid of it.
__global__ void convolve(float* result)
{
const int idx = (blockIdx.y*blockDim.x*gridDim.x)+blockIdx.x*blockDim.x+threadIdx.x;
const int y = idx/DATA_W;
const int x = idx-y*DATA_W;
if(x<DATA_W&&y<DATA_H)
{
double sum = 0;
for(int ky = -(KERNEL_H - KERNEL_Y - 1); ky <= KERNEL_Y; ky++)
{
for(int kx = -(KERNEL_W - KERNEL_X - 1); kx <= KERNEL_X; kx++)
{
int dx = x + kx;
int dy = y + ky;
if(dx < 0) dx = 0;
if(dy < 0) dy = 0;
if(dx >= DATA_W) dx = DATA_W - 1;
if(dy >= DATA_H) dy = DATA_H - 1;
float filterVal = d_Kernel[(KERNEL_Y - ky) * KERNEL_W + (KERNEL_X - kx)];
float pixelVal = tex2D(texData,(float)dx + 0.5f ,(float)dy + 0.5f);
sum += filterVal * pixelVal;
}
}
result[y * DATA_W + x] = (float)sum;
}
}
thank you very much