concurrent access on shared memory

Hello to all,

I have a three-dimensional block of dimension (BLOCK_X,BLOCK_Y, BLOCK_Z), a 2D matrix shared mem variable

of size (BLOCK_X,BLOCK_Y), and a shared mem variable of type boolean.

[codebox]

shared REAL module[BLOCK_X][BLOCK_Y];

shared bool computedModule[BLOCK_X][BLOCK_Y];

[/codebox]

each thread of the block writes in the shared mem in accord with its threadIdx.x and threadIdx.y.

Hence, BLOCK_Z threads write on the same cell of shared memory…

Since BLOCK_X*BLOCK_Y > WarpSize, i can be pretty sure that

thread (i,j,z) and thread (i,j,another_z) will not be executed at the same time.

for this assumption, i wrote my kernel as follows:

[codebox]

    if((i < xdim) && (j < (ydim -1)) )

{		

	//-- Get position component

	p1 = getVectorElementMat3d(vPos, i, j, k, xdim, pitch_position_d);

	p2 = getVectorElementMat3d(vPos, i, (j + 1), k, xdim, pitch_position_d);

	

	module[threadIdx.x][threadIdx.y] += pow(p2 - p1, 2);

	

	__syncthreads();

	

	if ( !computedModule[threadIdx.x][threadIdx.y])

	{

		module[threadIdx.x][threadIdx.y] = sqrt(module[threadIdx.x][threadIdx.y]);

		computedModule[threadIdx.x][threadIdx.y] = true;

	}

[/codebox]

it seems to work…but is it correct?

Should also check for bank confilcts?

Thank you in advance

Francesco

It’s better to use

module[threadIdx.y][threadIdx.x]

instead of

module[threadIdx.x][threadIdx.y]

to reduce bankconflicts.

N.