explain me the vector reduction

Hi all I’m new in cuda word
in this code

``````global__ void dot( float *a, float *b, float *c ) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
float temp = 0;
while (tid < N) {
temp += a[tid] * b[tid];
tid += blockDim.x * gridDim.x;
}
// set the cache values
cache[cacheIndex] = temp;
// synchronize threads in this block
// for reductions, threadsPerBlock must be a power of 2
// because of the following code
int i = blockDim.x/2;
while (i != 0) {
if (cacheIndex < i)
cache[cacheIndex] += cache[cacheIndex + i];
i /= 2;
}
if (cacheIndex == 0)
c[blockIdx.x] = cache[0];
}
``````

I don’t understand the aim of int i = blockDim.x/2;
(the reduction is in the same block) PLease explain me

[/code]
Thank you

Let assume we have tpb=blockDim.x threads in our block. The reduction algorithm reduces at each iteration the amount of data to half. The first iteration transforms the data from tpb to tpb/2 this is why you start with that. In the next iteration you reduce the data to tpb/4 and so on until you get to 1 element which will contain the sum of everything.
You start with tpb/2 because this way you avoid the bank conflicts.
Check this image: