CUDA OpenGL post-processing example

Simon_Green · February 22, 2007, 1:29pm

A few people have asked about this, so I’m attaching a simple example that demonstrates how to transfer image data back and forth between OpenGL and CUDA.

It performs a 2D convolution on an image of a simple 3D scene rendered by OpenGL.

Note that the image processing code is not really optimized for performance.
postProcessGL.zip (323 KB)

Wouter_Wiggers · February 22, 2007, 2:14pm

Instead of shared memory a texture could have been used for the convolution image as well right? Since convolution is done per pixel and pixels in a block are located spatially close I assumed using a texture would provide an easy implementation together with automatic caching.

Simon_Green · February 22, 2007, 4:52pm

That’s true, although unfortunately there’s no way currently to read directly from a texture allocated in OpenGL.

You can use texture lookups in CUDA, but only into arrays allocated by CUDA itself.

We hope to lift this restriction in a future release.

Wouter_Wiggers · February 22, 2007, 5:04pm

ah ok :)

Do you perhaps know the difference in execution time between a texture based convolution and a shared memory based convolution ?

Mark_Harris · February 24, 2007, 10:06pm

Using texture would also not provide the benefits of data re-use that shared memory affords. The larger the filter kernel size, the more threads will re-use the same data elements.

Mark

Simon_Green · February 26, 2007, 12:32pm

In our tests CUDA-based convolution using shared memory is about 2x faster than the equivalent OpenGL texture-based solution.

For example, it’s possible to do a 5x5 convolution at >1Gpixel/sec.

Wouter_Wiggers · February 26, 2007, 3:54pm

How about a convolution with a really big kernel, for example 64x64 (and the FFT approach is not possible)?

The shared memory approach gives problems since it is not big enough to hold the kernel constants. How to solve this without cache misses?

Would it for example be more efficient to implement 4 shared memory convolutions with 32x32 kernels than to do a big texture/constant cache based convolution with a 64x64 kernel? I think it is, but perhaps someone has already implemented this.

Mark_Harris · February 27, 2007, 11:01am

The kernel constants should be stored in constant memory, so that is not a problem. However the pixel data would be hard to fit in shared memory for arbitrarily large filters, you are correct.

Note also that some filters, such as gaussian blur kernels, are separable. Large separable filters can be much more easily fit in the shared memory. We will have an example of such a convolution in the next release of the SDK.

Mark

e.ping · May 26, 2007, 3:31pm

Hi Simon,

thanks for this estimation; the example in the SDK is a separable kernel. Could you eventually make the full 5x5 convolution code available?

Thanks and regards,

John

Eri_Rubin · May 27, 2007, 1:59pm

When will u guy release it ??? we are all eagrly waiting External Media

Topic		Replies	Views
CUDA Image Processing Demo & Soure code&Tutorials CUDA Programming and Performance	7	25046	April 2, 2007
3D texture based separable convolution extension of SDK example CUDA Programming and Performance	1	1854	April 6, 2010
CUDA texture memory performance CUDA Programming and Performance	4	33580	January 13, 2009
CUDA texture memory performance CUDA Programming and Performance	0	1259	January 12, 2009
When to use textures CUDA Programming and Performance	7	8126	February 12, 2008
General Convolution CUDA Programming and Performance	7	2918	April 21, 2009
Using shared memory for bilinear filter CUDA scale Using shared memory for bilinear filter CUDA Programming and Performance	5	14386	March 16, 2008
Help with some CUDA concepts CUDA Programming and Performance	7	1448	August 16, 2009
First attempt - convolution Non-seperable image convolution CUDA Programming and Performance	4	5393	April 13, 2008
Texture memory when to use ? CUDA Programming and Performance	6	20736	October 7, 2009

CUDA OpenGL post-processing example

Related topics