Simple problem - but how to do fast! Suggestions welcome

buggerlugz64 · February 9, 2012, 12:18pm

First off - I’m new to gpu/cuda computing. I’ve spent a few weeks reading up and trying a few ideas out but I could do with the collective advice of the community. I have a solution but I’m sure I (we) could do better.

I have a 2D float array of data say, 1024 x 512. I need to perform many small (say 32 element) summations from this array which reprsent curves in the array and store the results in another array (intensity). A separate array represents the curves i.e. 1024 x 32 and provides the column offsets in the data array. I perform the 1024, 32 element sums, then move down the data array 1 sample and repeat all the sums.

That is my attempt to describe it in words. Below is the simple kernel I have used to do this. For the example dimensions I have suggested above my threadblock size is [1024,1] and my gridsize is [32, 512]. As you will see I use atomic adds.

global void array_sums4( float * intensity, const float * data, const int * curves)
{
int idx1 = threadIdx.x + blockDim.xblockIdx.y;
int idx2 = threadIdx.x + blockDim.xblockIdx.x;
int idx3 = curves[idx2] + (blockIdx.x + blockIdx.y)*(blockDim.x+16); //+16 as data array is actually 16 elements longer
atomicAdd(&intensity[idx1],data[idx3]);
}

Any suggestions on a better approach to this problem?

pasoleatis · February 9, 2012, 12:58pm

Use shared memory to perform atomic add inside a block, and then do the atomic add to the global memory.

Topic		Replies	Views
Easyway to compute the sum of the array? CUDA Programming and Performance	4	8088	February 13, 2008
Reduce choice CUDA Programming and Performance	25	348	March 23, 2025
Summing array elements using kernel Access frome the whole block grid CUDA Programming and Performance	3	908	July 16, 2010
I want to calculate the sum of the 512 lines CUDA Programming and Performance	16	2161	January 4, 2013
Array Sum in cuda CUDA Programming and Performance	5	11572	May 30, 2010
Best way to sum the elements of a block array CUDA Programming and Performance	2	1140	March 25, 2009
Basic parallel programming need some help CUDA Programming and Performance	15	6483	June 15, 2011
adding array elements in shared memory CUDA Programming and Performance	3	1424	February 10, 2009
Does number of shared memory banks effect results? CUDA Programming and Performance	6	1245	May 29, 2011
sequential sum within a kernel. CUDA Programming and Performance	23	5198	September 8, 2008

Simple problem - but how to do fast! Suggestions welcome

Related topics