I’m trying to do a gaussian blur on an image (in OpenCL), but all the algorithms I found are for separable gaussian (the blur is done horizontally then vertically), so it’s 2 1-dimensionnal operations.(e.g. Nvidia sdk)

I’m looking for how to prform a single pass 2-dimensionnal gaussian blur.

Gaussian blur is separable, so why would you want to do it in 2D? The separable version will give exactly the same results and is much more efficient.

If you want to do some other non-separable 2D convolution, then I understand. I don’t think we have any samples in the SDK, although NVPP will do this: http://developer.nvidia.com/object/npp_home.html

You can pull off running the two seperable kernels in a single cuda kernel by the way. does require some extra work on the first one, but avoids a second copy into shared memory.

For non-seperable filters, look for a talk Joe Stam gave at last years GTC (forgot the title, sorry). He gave a talk about convolutions

I need to do it 2d because I have an application that contains several operations on an image,
2d operations (Geometry adjustment then Gaussian blur then color adjustment then …)
since those operations are 2d and I want to do all of them in the same kernel, I need to do the gaussian 2d as well.
(you have to define the kernel dimension when enqueuing it)

please tell me if I’m wrong.

I’m working with OpenCL, is npp compliant with it?

This won’t work because you’re only able to do work-group level synchronization. As soon as your global work size is greater than your local work size, there’s no way to be sure that operation results of work-items from another work-group have been committed to memory. Thus, as soon as your kernel uses neighbouring pixels in its calculation, you have a problem with synchronization which makes it impossible to avoid multiple kernels.

In my experience, false results caused by this issue are often not easily visible because they only apply to a handful of pixels, if at all. For blurring, in particular, it is probably not even noticeable to the eye since the before/after look is only slighly different.

(The only exception was if blurring would be the very first operation to perform, follow by operations which solely work on a single pixel and do not incorporate neighbours.)

Actually it can work, you just need to differentiate between input pixels and valid output pixels. As I said, it’s even possible to perform as a seperable filter in a single kernel, although at the cost of a few more operations than separate kernels.

You need to make sure though to work in shared memory if possible and write to a different buffer than the input one (this won’t work in place)

If you need to do an arbitrary, non-separable 2D convolution and if you have a fixed mask size and constant mask coefficients, you can modify the oclSobelFilter sample in the NVIDIA GPU Computing SDK to be a Gaussian blur.

If you need variable mask coefficients instead of constant mask coefficients, the additional modification needed would be to pass those by pointer as a kernel argument, either __global or __constant. The oclConvolutionSeparable shows shows how to do this part (for 1D array, but the principle is the same) if an example is needed.

If you need to do an arbitrary, non-separable 2D convolution and if you have a fixed mask size and constant mask coefficients, you can modify the oclSobelFilter sample in the NVIDIA GPU Computing SDK to be a Gaussian blur.

If you need variable mask coefficients instead of constant mask coefficients, the additional modification needed would be to pass those by pointer as a kernel argument, either __global or __constant. The oclConvolutionSeparable shows shows how to do this part (for 1D array, but the principle is the same) if an example is needed.