Hi,
I intend to sum the results of a block array. So far here’s what I got:
extern __shared__ int data[];
// ...
// Sum all elements from data and place the result in data[0]
int nextInterval;
for (int interval = 1; interval < blockDim.x; interval = nextInterval) {
nextInterval = 2 * interval;
int positionSum = threadIdx.x + interval;
if (threadIdx.x % nextInterval == 0 && positionSum < blockDim.x) data[threadIdx.x] += data[positionSum];
__syncthreads();
}
This works fine, but, since I’m a newbie, I would like to know if there is a better (by better I mean faster) way to do this.
Thanks