Best way to sum the elements of a block array

Hi,

I intend to sum the results of a block array. So far here’s what I got:

extern __shared__ int data[];

// ...

// Sum all elements from data and place the result in data[0]

int nextInterval;

for (int interval = 1; interval < blockDim.x; interval = nextInterval) {

   nextInterval = 2 * interval;

		

   int positionSum = threadIdx.x + interval;

   if (threadIdx.x % nextInterval == 0 && positionSum < blockDim.x) data[threadIdx.x] += data[positionSum];

   __syncthreads();

}

This works fine, but, since I’m a newbie, I would like to know if there is a better (by better I mean faster) way to do this.

Thanks

Take a look at how the reduction example sums data in a block. Modulo is a pretty expensive operation. Also, if numerical error is a problem, look into Kahan summation.

Thanks.