I was looking around for a quick method for applying a mean filter to an image and I found this website â€“ http://suja-cuda-optimization.synthasite.com, which takes the ConvolutionSeparable example from the SDK and optimizes it using some tweaks, such as separating the color channels into 3 arrays and then apply the filter to them separately â€“ http://suja-cuda-optimization.synthasite.c…—version2.php. I understand all the code, for the exception of the filter algorithm, which is very confusing.

This generates the Gaussian blur filter kernel:

```
for ( i = 0; i < KERNEL_W; i++ )
{
float dist = (float)(i - KERNEL_RADIUS) / (float)KERNEL_RADIUS;
h_Kernel[i] = expf (-dist * dist / 2);
kernelSum += h_Kernel[i];
}
for ( i = 0; i < KERNEL_W; i++ )
h_Kernel[i] /= kernelSum;
```

This is the reference row convolution filter, to compare with its GPU counterpart:

```
extern "C" void convolutionRowCPU(float *h_Result, float *h_Data, float *h_Kernel, int dataW, int dataH, int kernelR)
{
int x, y, k, d;
float sum;
for(y = 0; y < dataH; y++)
for(x = 0; x < dataW; x++){
sum = 0;
for(k = -kernelR; k <= kernelR; k++){
d = x + k;
if(d >= 0 && d < dataW)
sum += h_Data[y * dataW + d] * h_Kernel[kernelR - k];
}
h_Result[y * dataW + x] = sum;
}
}
```

And the column one:

```
extern "C" void convolutionColumnCPU(float *h_Result, float *h_Data, float *h_Kernel, int dataW, int dataH, int kernelR)
{
int x, y, k, d;
float sum;
for(y = 0; y < dataH; y++)
for(x = 0; x < dataW; x++){
sum = 0;
for(k = -kernelR; k <= kernelR; k++){
d = y + k;
if(d >= 0 && d < dataH)
sum += h_Data[d * dataW + x] * h_Kernel[kernelR - k];
}
h_Result[y * dataW + x] = sum;
}
}
```

Initially, I thought modifying the filter kernel (not the CUDA kernel) would be sufficient, but it seems it’s more complicated than that. Perhaps someone can help me to figure out how to “downgrade” the code to a mean fiter.

Thank you.