Hello to all,
I have a three-dimensional block of dimension (BLOCK_X,BLOCK_Y, BLOCK_Z), a 2D matrix shared mem variable
of size (BLOCK_X,BLOCK_Y), and a shared mem variable of type boolean.
[codebox]
shared REAL module[BLOCK_X][BLOCK_Y];
shared bool computedModule[BLOCK_X][BLOCK_Y];
[/codebox]
each thread of the block writes in the shared mem in accord with its threadIdx.x and threadIdx.y.
Hence, BLOCK_Z threads write on the same cell of shared memory…
Since BLOCK_X*BLOCK_Y > WarpSize, i can be pretty sure that
thread (i,j,z) and thread (i,j,another_z) will not be executed at the same time.
for this assumption, i wrote my kernel as follows:
[codebox]
if((i < xdim) && (j < (ydim -1)) )
{
//-- Get position component
p1 = getVectorElementMat3d(vPos, i, j, k, xdim, pitch_position_d);
p2 = getVectorElementMat3d(vPos, i, (j + 1), k, xdim, pitch_position_d);
module[threadIdx.x][threadIdx.y] += pow(p2 - p1, 2);
__syncthreads();
if ( !computedModule[threadIdx.x][threadIdx.y])
{
module[threadIdx.x][threadIdx.y] = sqrt(module[threadIdx.x][threadIdx.y]);
computedModule[threadIdx.x][threadIdx.y] = true;
}
[/codebox]
it seems to work…but is it correct?
Should also check for bank confilcts?
Thank you in advance
Francesco