 # Tiled Reduction Problem Only Works if Tile count is 1

I have a 2D Square Float Matrix in which each Block calculates the Maximum Value of its corresponding row using Reduction

Assumptions to work for this minimized kernel:

Dim is Equal to the Number of Columns and Rows and so as the Number of Blocks and is a power of 2 size.

Since we can only have 512 Threads per Block at Cuda 1.1. I can only compare 512 Elements

in my reduction.

So I want to Tile the Row into sub Parts.

Minimum value of the array is 0. (shared memory initializing)

The width of the tile must be a Power of 2 to make the reduction work.

This Code only calculates results correctly if the Tile Count is 1 which means the width of the tile is equal to the dimension of the matrix.

``````#define TILE_WIDTH 512 //Power of 2 and up to 2048 on Cuda 1.1

/*

* @param mat float** Input Squared Float Matrix

* @param widthTileCount int Iteration Count

* @param maxArr float* Output Float Array holding Maximum Values for each row

*/

__global__ void cluster(float** mat,

unsigned int widthTileCount,

float* maxArr)

{

//shared reduction array

__shared__ float values[TILE_WIDTH];

//shared maximum value of this block

__shared__ float max;

//initialize shared memory

max = 0.0f;

//column index

unsigned int colIdx = 0;

/*

* loop over the tiles requirered to load the tile of the row

* and compare and set the maximum shared value after the reduction

*/

for(unsigned int i = 0; i < widthTileCount; i++)

{

//get linear column index of this thread

colIdx = i * TILE_WIDTH + threadIdx.x;

/*

* Do Reduction

*/

for(unsigned int s = blockDim.x >> 1; s > 0; s >>= 1)

{

}

//compare and set shared max value with the one of the current tile

max = __max(max, values);

}

//output data

maxArr[blockIdx.x] = max;

}
``````
``````MaxArr:

is:

0.985   0.991   0.932   0.948   0.992   0.926   0.704   0.893   0.970   0.941

should be:

0.985   0.991   0.972   0.993   0.992   0.977   0.957   0.973   0.970   0.941
``````

I dont get it, i followed some steps on paper. In the Output are some values right and some wrong.

Would be thankfull for any hints.

Edit: Yeah if tileWidthCount is wrong results cannot be correct -.-