Reduction #1: Interleaved Addressing

Hi,

in the official example [1] im not sure how the code knows how many elements are in *g_idata, reduce0 on page 7 for example looks like its working for blockdim.x-1 size inputs and thats it.

So how does it work with input arrays that are larger than blockdim.x?

There’s probably a limit on blockdim.x because otherwise you wouldn’t need soemthing like this:
int tid = threadIdx.x + blockIdx.x * blockDim.x;

1: http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf