N-Sample moving average filter

Jaiyk · December 4, 2008, 7:05pm

Hi all-

Using CUDA, how would you efficiently implement an n-sample moving average filter for a 1D array of integers? For discussion, let’s say the array is 2^16 integers long, and that we want “n” (the window length) to be 32. I’m not used to “thinking parallel” – any tips on how to allocate threads to this problem would be appreciated.

cbuchner1 · December 4, 2008, 9:16pm

Not sure if this is too helpful, but the BoxFilter SDK sample applies a separable moving average filter over a 2D image. (Separable means this is done separately in the horizontal and in the vertical direction). You may get some ideas from this SDK sample.

Re-arrange the 2Ë†16 pixels into a 2D image of 1024 wide, 64 high and run the horizontal box filter on it with a filter width of 32. Replicate 32 pixels from each row’s left side and append it on the right edge of the image, offset by one row. Similarly replicate the rightmost 32 pixels at the left edge of the image, creating 1088 pixels total width. That solves the overlap problem where the original box filter does not filter across rows.

The SDK sample is nicely parallel threaded and quite fast already, so you don’t have to re-invent the wheel. However you may need to modify the data format to suit your needs (integers instead of 8 bit true color RGB pixels).

SPWorley · December 5, 2008, 10:03am

If your integers are all less than 65536:

Do a prefix sum on the array, call that new array s. It’s a sum table.
Then the moving average between A and B is just (s[B]-s[A])/(B-A).
If the moving average is a constant width, the compute of 1/(B-A) is precomputable, and the array lookups can be nicely coalesced (in fact automagically on G200).

If your integers values are full range of 2^32, use two sum arrays, one for the low word and one for the high world.
This would also help for wider ranges, not just higher values.

hajisaib · February 17, 2011, 1:16pm

Hi,

i got a question about A and B. A and B are arrays, or they pertain to just a single element.

SPWorley · February 17, 2011, 6:34pm

A and B are array indices, defining the range of the sum you want to add over.

Topic		Replies	Views
Single Thread Processing a vector of elements General Concept. CUDA Programming and Performance	5	3037	July 27, 2009
N-point moving average getting inconsistent results CUDA Programming and Performance	2	6259	November 25, 2011
Can cuda process variable length arrays to process variable image sizes? CUDA Programming and Performance	8	1494	November 22, 2019
Is my problem suitable for implementation in CUDA? CUDA Programming and Performance	9	4947	June 17, 2008
basic into questions, 1-d diffusion implementation beginner's qustions CUDA Programming and Performance	2	1995	July 16, 2009
Shared Memory Limitation CUDA Programming and Performance	8	4786	February 4, 2009
Any good ideas for this special "reduction" ? CUDA Programming and Performance	10	6797	November 20, 2009
Multi-dimensional arrays in a CUDA kernel? CUDA Programming and Performance	10	8246	August 27, 2017
Parallel Anti diagonal 'for' loop CUDA Programming and Performance	5	1502	October 17, 2013
How to do Parallel Reduction of many unequally sized arrays in CUDA? CUDA Programming and Performance	1	13028	November 24, 2009

N-Sample moving average filter

Related topics