Hello to all,

I have a three-dimensional block of dimension (BLOCK_X,BLOCK_Y, BLOCK_Z), a 2D matrix shared mem variable

of size (BLOCK_X,BLOCK_Y), and a shared mem variable of type boolean.

[codebox]

**shared** REAL module[BLOCK_X][BLOCK_Y];

**shared** bool computedModule[BLOCK_X][BLOCK_Y];

[/codebox]

each thread of the block writes in the shared mem in accord with its threadIdx.x and threadIdx.y.

Hence, BLOCK_Z threads write on the same cell of shared memory…

Since BLOCK_X*BLOCK_Y > WarpSize, i can be pretty sure that

thread (i,j,z) and thread (i,j,another_z) will not be executed at the same time.

for this assumption, i wrote my kernel as follows:

[codebox]

```
if((i < xdim) && (j < (ydim -1)) )
{
//-- Get position component
p1 = getVectorElementMat3d(vPos, i, j, k, xdim, pitch_position_d);
p2 = getVectorElementMat3d(vPos, i, (j + 1), k, xdim, pitch_position_d);
module[threadIdx.x][threadIdx.y] += pow(p2 - p1, 2);
__syncthreads();
if ( !computedModule[threadIdx.x][threadIdx.y])
{
module[threadIdx.x][threadIdx.y] = sqrt(module[threadIdx.x][threadIdx.y]);
computedModule[threadIdx.x][threadIdx.y] = true;
}
```

[/codebox]

it seems to work…but is it correct?

Should also check for bank confilcts?

Thank you in advance

Francesco