gaussian recursive filter and box filter


i was wondering : anybody tried to implement a gaussian recursive filter with CUDA like this one :

I’m quite new to GPGPU, but I think it should be easy to implement using the Box filter sample in the SDK (computing all row/column in parallel, with one thread per row/column)
and it may lead to more efficient convolution scheme than the other one provided in the sample code (separable with shared memory and texture memory)


Yes, I’ve been planning to extend the boxfilter sample in the SDK to do a recursive Gaussian filter as described in that paper (IIR style).

It shouldn’t be difficult. The kernel is memory limited, so the extra maths probably won’t effect the performance much.

Okay, here’s an implementation. Don’t say we never do anything for you :)

This was based on the Deriche filter code from the CImg library, which I believe was developed by Inria:

I haven’t tried optimizing it much, but in practice it appears to be about half the speed of the boxfilter sample (about 350fps for a 512x512 image on my machine). But this is for any width filter, which is nice.

The problem in general with these kind of “thread-per-row” filters is that the amount of parallelism is limited by the image size. (11.9 KB)

thanks a lot !!
Actually I already use CImg for my work, and I was also trying right now to adapt it to CUDA.