Shared memory & overlapping tiles for image processing optimizing computation and maximizing dat

lacas · August 27, 2010, 12:27pm

Hello,

I want to perform convolution/morphological operators using shared memory for obvious reasons.
My operators are a bit complex, but behave like, let’s say, two 3x3 convolutions or one 5x5 convolution.
I want to implement the 5x5 convolution (for example usual 5x5 erosion) with two cascaded 3x3 convolutions
the two 3x3 convos has a smaller complexity than the 5x5. That’s the reason why I want to process data in such a way.

my original 5x5 convo looks like
Y[i][j] = X[i-2][j-2] AND X[i-2][j-1] … AND X[i-2][j+2]
AND
X[i+2][j-2] … AND … X[i+2][j+2]

the two 3x3 conlo look like
Y[i][j] = X[i-1][j-1] AND X[i-2][j] AND X[i-1][j+1]
AND
X[i+1][j-1] … AND … X[i+1][j+1]

but if the iteration space of the first one is (i,j) in [0…h-1][0…w-1], with h and w the tile’s size
the second one is [0-1…h-1+1][0-1…w-1+1]
Let’s assume there is an hidden offset adressing system that provide a adressing mecanism compatible with shared memory adressing (that’s 0 offset adressing: [0…h+2-1]x[0…w+2-1])

in order to save time data, before, during and after processing are stored into shared memory

my problem is how to tell to the thread block to have an iteration space on [-1…h]x[-1…w]
so processing scheme is
second step: (h+2)x(w+2) tile → (h)x(w) tile [second convolution]
first step: (h+4)x(w+4) tile → (h+2)x(w+2) tile [first convolution]

this processing scheme is different from the “convolution separable” paper from Nvidia, as I want to store intermediare results into memory

So, is it possible to do such a scheme with CUDA ?

Thanks in advance

Topic		Replies	Views
3d convolutions and correlations Any experience with 3d filtering? CUDA Programming and Performance	3	8841	October 4, 2007
when to use shared memory CUDA Programming and Performance	0	2259	March 10, 2009
CUDA OpenGL post-processing example CUDA Programming and Performance	9	13245	May 27, 2007
Simple 2d Convolution Low Pass filter like blur filter CUDA Programming and Performance	3	2819	April 15, 2014
CUDA Image Processing Demo & Soure code&Tutorials CUDA Programming and Performance	7	25046	April 2, 2007
Separable Convolution and Shared Memory CUDA Programming and Performance	3	2464	January 20, 2017
Convolution Texture with Shared Memory CUDA Programming and Performance	3	481	April 15, 2024
Help: Shared memory vs. Caching in ConvolutionSeparable Example CUDA Programming and Performance	1	4476	December 7, 2008
Image Convolution with CUDA paper Not quite understanding the tiling method they're showing CUDA Programming and Performance	9	6284	January 11, 2018
Help with some CUDA concepts CUDA Programming and Performance	7	1448	August 16, 2009

Shared memory & overlapping tiles for image processing optimizing computation and maximizing dat

Related topics