Best way to sum the elements of a block array


I intend to sum the results of a block array. So far here’s what I got:

extern __shared__ int data[];

// ...

// Sum all elements from data and place the result in data[0]

int nextInterval;

for (int interval = 1; interval < blockDim.x; interval = nextInterval) {

   nextInterval = 2 * interval;


   int positionSum = threadIdx.x + interval;

   if (threadIdx.x % nextInterval == 0 && positionSum < blockDim.x) data[threadIdx.x] += data[positionSum];



This works fine, but, since I’m a newbie, I would like to know if there is a better (by better I mean faster) way to do this.


Take a look at how the reduction example sums data in a block. Modulo is a pretty expensive operation. Also, if numerical error is a problem, look into Kahan summation.