Urgent! Help with understanding the parallel algorithm used. This is regarding the convolutionSe

Hey guys,

I’m trying to implement Separable Convolution with a Gaussian Filter.
So, first step was to read the whitepaper found in the sdk (convolutionSeparable.pdf, found here: http://developer.download.nvidia.com/compu…nSeparable.pdf).

Could understand most of it, and then went on to proceed to understand the CUDA code.

However, I’m not really able to understand the approach being used to perform the convolution. Or, what really, is the Parallel Algorithm implemented, atleast a basic algorithm being used in the SDK to allows us to write our own code.

For eg, the Parallel Prefix Sum (Scan) present in the SDK, (scan.pdf, found here: http://developer.download.nvidia.com/compu…n/doc/scan.pdf) explains clearly, the parallel algorithm used in the CUDA code…So this atleast, gives us a starting point on how to write the code.

So, if anyone could tell me, atleast in simple words or provide any links to any info on the basic parallel algorithm being used in the convolutionSeparable SDK code, that would be very helpful for me.

Thanks in advance!

Can’t help you much understanding separableConvolution, but I found the recursiveGaussian sample to be quite interesting as well. It is very compute efficient for large blur kernels. It approximates a gaussian filter using a 3rd degree IIR filter which is applied in a forward and backward pass - first on each pixel column of the image, then on each row.

The code was understandable to me, at least. The name “recursive” is a bit misleading. All what’s recursive about it is that the filter has an infinite response length in theory. I’d have named it separableIIRGaussian or something like that.


Alright, I shall have a look at that. Meanwhile, do you know the what the mean and variance of the Gaussian filter used in Separable Convolution are? (For eg, a 0 Mean Filter)…

Also, speaking in general, when we start off writing a CUDA code, do we keep in mind a parallel algorithm before starting to code, or do we just, start writing the code? (This question may seem silly, sorry, i’m a newbie when it comes to CUDA and parallel programming) Coz, i’m just trying to identify the right way to start off with Separable Convolution…

I know most aspects of CUDA programming now, so I try to make up my mind about code structure for efficient execution before I start writing any code.

Most beginners will want to take the naive (simple) approach first, and optimize from there… In some cases it means scrapping the first approach. As a minimal first step you have to think about some kind of thread structure (blocks, grid) first because some level of parallelism needs to be achieved.

Cool. Thanks for the info. In any case, do you happen to know about the Mean and Variance of the Gaussian Filter used in the convolutionSeparable SDK code?

Nope. Not me. Never looked at that sample in detail.

Think I found my answer here: http://homepages.inf.ed.ac.uk/rbf/HIPR2/gsmooth.htm


"Figure 2 shows 2-D Gaussian distribution with mean (0,0) and Standard Deviation=1

The idea of Gaussian smoothing is to use this 2-D distribution as a `point-spread’ function, and this is achieved by convolution. Since the image is stored as a collection of discrete pixels we need to produce a discrete approximation to the Gaussian function before we can perform the convolution. In theory, the Gaussian distribution is non-zero everywhere, which would require an infinitely large convolution kernel, but in practice it is effectively zero more than about three standard deviations from the mean, and so we can truncate the kernel at this point. Figure 3 shows a suitable integer-valued convolution kernel that approximates a Gaussian with a Standard Deviation of 1.0. "[/i]