Using shared memory for bilinear filter CUDA scale Using shared memory for bilinear filter

e.ping · March 12, 2008, 6:34am

I wrote a cuda program which takes an arbitrarily size RGBA image and copies into a smaller image. While it does this it must do a sort of “bilinear filter” to average the pixels. The operation is quite simple:

Determine scale/ratio of src vs. dst image (ie: 8x8 to 4x4 is scale 2).
For each source pixel, divide by scale^2, and add to destination. So in the above example we’d be taking 4 pixels, scaling one each by 0.25, and adding them together to produce the result pixel.

I’m running one thread per source pixel. The problem of course is this is completely not thread safe, many threads are reading-to and writing-from the same destination pixel. It’s also going to be terrible for memory coalescing.

I think what I need to do is allocate some shared memory and break this up into two passes. Wanted to get some feedback on what a good approach would be? I’m almost thinking I should rewrite it so that I run one thread per destination pixel, and do a gather-operation instead. This won’t be well coalesced either, though.

MisterAnderson42 · March 12, 2008, 12:32pm

Run one thread per destination pixel. It would then read whatever source pixels it needs through a 2D texture read, and coalescing the write is easy.

kristleifur · March 12, 2008, 12:48pm

Check out the convolutionSeparable example in the SDK.

Also, have you checked out using textures? You get bilinear interpolation ‘for free’ when you’re reading from textures. There’s an example for that, too.

e.ping · March 16, 2008, 11:47pm

This is the first thing i looked into, but it seems that CUDA does not support any fancy texturing modes oddly enough. Straight from the docs:

These functions fetch the region of linear memory bound to texture reference texRef using texture coordinate x. No texture filtering and addressing modes are supported. For integer types, these functions may optionally promote the integer to 32-bit floating point.

Also, even if it supported bilinear filtering, trying to copy into a texture that’s less than 50% of the size of the original texture, you would not get correct filtering any more since it would only consider the 4 neighbouring texels when doing the bilinear blend.

I’ll look into the convolutionSeparable sample.

MisterAnderson42 · March 16, 2008, 11:52pm

You have a point there.

But, bilinear filtering among the 4 neighboring elements in the array CAN be done in CUDA. You quoted a portion of the manual that was referring only to 1D textures bound to device memory. Check again under the section for 2D (or 1D) textures bound to a “array” memory. (i.e. cudaBindTextureToArray). There aren’t as many texturing modes as OpenGL has, but you can do bilinear, clamp or wrap coordinates, or used normalized coordinates.

e.ping · March 16, 2008, 11:53pm

I take that back - the docs are a bit unclear, it seems there is some basic bilinear and linear filtering supported.

Linear texture filtering may be done only for textures that are configured to return floating-point data. It performs low-precision interpolation between neighboring texels. When enabled, the texels surrounding a texture fetch location are read and the return value of the texture fetch is interpolated based on where the texture coordinates fell between the texels. Simple linear interpolation is performed for one-dimensional textures and bilinear interpolation is performed for two-dimensional textures.

The problem here is that a) I’d have to convert back to integer, and B) it only does nearest neighbour filtering, so copying to a very small texture would lead to incorrect results.

Topic		Replies	Views
Linear interpolation with integer texture. CUDA Programming and Performance	6	2748	August 12, 2022
Bilinear Interpolation - Mathematica How to use texture memory from mathematica CUDA Programming and Performance	2	11634	February 2, 2011
texture interpolation CUDA Programming and Performance	9	13046	September 23, 2009
3D Geographic Interpolation too inaccurate How to best deal with poor texture interpolation? CUDA Programming and Performance	9	1387	December 19, 2024
Bilinear interpolation problem 2D texture fetch linear interpolation won't work CUDA Programming and Performance	3	5717	February 1, 2010
CUDA OpenGL post-processing example CUDA Programming and Performance	9	13248	May 27, 2007
Accelerated Filtering CUDA Programming and Performance	12	211	September 9, 2024
Memory performance in image processing example CUDA Programming and Performance	9	1609	March 24, 2011
Non-Separable and Non-Linear Image Filter CUDA Programming and Performance	0	936	May 6, 2009
Help with some CUDA concepts CUDA Programming and Performance	7	1448	August 16, 2009

Using shared memory for bilinear filter CUDA scale Using shared memory for bilinear filter

Related topics