I am trying to learn about parallel reduction through the example in SDK. The white paper for parallel reduction explains 7 steps to optimize the process. I can understand the first six optimization procedure, but i cant understand the part on algorithm cascading.
Following is the code provided in the white paper, slide 32:
unsigned int tid=threadIdx.x;
unsigned int i=blockIdx.x2blockDim.x+threadIdx.x;
unsigned int gridSize=blockSize2gridDim.x;
Is the gridSize here half of the length of the vector to be reduced? And what does n refers to? (in while(i<n))