shared memory

Kiarash · June 13, 2011, 7:42pm

Hi,

If we define for example an array with label Shared within the global part of the code, is it going to be defined again for each thread or it’s defined only once? I know shared memory is for the whole block but I dont know where it should I define it.

In the code below, it’s defined in a loop within the global and I’m wondering if this is how it should be?

__global__ void MatMulKernel( Matrix A, Matrix B, Matrix C){

	int blockRow= blockIdx.y;

	int blockCol= blockIdx.x;	

	

	Matrix Csub = GetSubMatrix (C, blockRow, blockCol);

	

	float Cval=0;

	

	int row=threadIdx.y;

	int col=threadIdx.x;

	

	for (int m=0; m<(A.width/BLOCK_SIZE); ++m){

	

		Matrix Asub=GetSubMatrix(A, blockRow, m);

		Matrix Bsub= GetSubMatrix (B, m , blockCol);

		

		__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];

		__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];

		

		As[row][col]= GetElement (Asub, row, col);

		Bs[row][col]= GetElement (Bsub, row, col);

		

		__syncthreads();

		

		for (int e=0; e<BLOCK_SIZE; ++e){

			Cval+= As[row][e]* Bs[e][col];

		}

		

		__syncthreads();

		

	}

		

	SetElement(Csub, row, col, Cval);

	

}

Thanks for your help!

Kiarash · June 13, 2011, 7:42pm

Hi,

If we define for example an array with label Shared within the global part of the code, is it going to be defined again for each thread or it’s defined only once? I know shared memory is for the whole block but I dont know where it should I define it.

In the code below, it’s defined in a loop within the global and I’m wondering if this is how it should be?

__global__ void MatMulKernel( Matrix A, Matrix B, Matrix C){

	int blockRow= blockIdx.y;

	int blockCol= blockIdx.x;	

	

	Matrix Csub = GetSubMatrix (C, blockRow, blockCol);

	

	float Cval=0;

	

	int row=threadIdx.y;

	int col=threadIdx.x;

	

	for (int m=0; m<(A.width/BLOCK_SIZE); ++m){

	

		Matrix Asub=GetSubMatrix(A, blockRow, m);

		Matrix Bsub= GetSubMatrix (B, m , blockCol);

		

		__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];

		__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];

		

		As[row][col]= GetElement (Asub, row, col);

		Bs[row][col]= GetElement (Bsub, row, col);

		

		__syncthreads();

		

		for (int e=0; e<BLOCK_SIZE; ++e){

			Cval+= As[row][e]* Bs[e][col];

		}

		

		__syncthreads();

		

	}

		

	SetElement(Csub, row, col, Cval);

	

}

Thanks for your help!

tera · June 14, 2011, 11:35am

As you already said, shared memory is allocated per block, i.e. all threads of a block see identical values for the shared memory (if __syncthreads() is used properly to synchronize accesses).

Often shared variables are declared at the beginning of a kernel, although you use is fine as well. Trying to declare a shared variable outside of a kernel will just give an error.

tera · June 14, 2011, 11:35am

As you already said, shared memory is allocated per block, i.e. all threads of a block see identical values for the shared memory (if __syncthreads() is used properly to synchronize accesses).

Often shared variables are declared at the beginning of a kernel, although you use is fine as well. Trying to declare a shared variable outside of a kernel will just give an error.

Topic		Replies	Views
Can I define more than one variable in shared memory? CUDA Programming and Performance	2	340	June 6, 2022
Shared memory and multiple blocks CUDA Programming and Performance	2	2399	March 16, 2011
Shared Memory allocation.. CUDA Programming and Performance	5	5351	July 9, 2010
Use shared Memory CUDA Programming and Performance	3	432	December 26, 2019
shared memory and syncthreads question CUDA Programming and Performance	2	1211	March 3, 2009
Shared Memory - Dynamic Allocation CUDA Programming and Performance	2	21405	November 21, 2008
CUDA: Using shared memory between different kernels.. CUDA Programming and Performance	4	16272	July 21, 2017
Shared memory access of many threads CUDA Programming and Performance	2	2818	December 4, 2008
Using shared Memory CUDA Programming and Performance	3	4869	March 11, 2012
Doubts about Sharedmemory. CUDA Programming and Performance	1	3104	June 4, 2009

shared memory

Related topics