I am trying to learn about parallel reduction through the example in SDK. The white paper for parallel reduction explains 7 steps to optimize the process. I can understand the first six optimization procedure, but i cant understand the part on algorithm cascading.

Following is the code provided in the white paper, slide 32:

unsigned int tid=threadIdx.x;

unsigned int i=blockIdx.x*2*blockDim.x+threadIdx.x;

unsigned int gridSize=blockSize*2*gridDim.x;

sdata[tid]=0;

while(i<n){

sdata[tid]=x[i]+x[i+blockDim.x];

i+=gridSize}

__syncthreads();

Is the gridSize here half of the length of the vector to be reduced? And what does n refers to? (in while(i<n))

Thanks.