Reduction #1: Interleaved Addressing


in the official example [1] im not sure how the code knows how many elements are in *g_idata, reduce0 on page 7 for example looks like its working for blockdim.x-1 size inputs and thats it.

So how does it work with input arrays that are larger than blockdim.x?

There’s probably a limit on blockdim.x because otherwise you wouldn’t need soemthing like this:
int tid = threadIdx.x + blockIdx.x * blockDim.x;