concurrent access on shared memory

Hello to all,

I have a three-dimensional block of dimension (BLOCK_X,BLOCK_Y, BLOCK_Z), a 2D matrix shared mem variable

of size (BLOCK_X,BLOCK_Y), and a shared mem variable of type boolean.


shared REAL module[BLOCK_X][BLOCK_Y];

shared bool computedModule[BLOCK_X][BLOCK_Y];


each thread of the block writes in the shared mem in accord with its threadIdx.x and threadIdx.y.

Hence, BLOCK_Z threads write on the same cell of shared memory…

Since BLOCK_X*BLOCK_Y > WarpSize, i can be pretty sure that

thread (i,j,z) and thread (i,j,another_z) will not be executed at the same time.

for this assumption, i wrote my kernel as follows:


    if((i < xdim) && (j < (ydim -1)) )


	//-- Get position component

	p1 = getVectorElementMat3d(vPos, i, j, k, xdim, pitch_position_d);

	p2 = getVectorElementMat3d(vPos, i, (j + 1), k, xdim, pitch_position_d);


	module[threadIdx.x][threadIdx.y] += pow(p2 - p1, 2);




	if ( !computedModule[threadIdx.x][threadIdx.y])


		module[threadIdx.x][threadIdx.y] = sqrt(module[threadIdx.x][threadIdx.y]);

		computedModule[threadIdx.x][threadIdx.y] = true;



it seems to work…but is it correct?

Should also check for bank confilcts?

Thank you in advance


It’s better to use


instead of


to reduce bankconflicts.