Different ways of using Shared memory Question about programming shared memory

ppofbb · December 6, 2007, 8:02pm

I read two samples, in which they use shared memory in different ways.

One is the matrix multiplication at the end of the programming guide. It declares the shared memory array in the device function as:

__global__ void Muld(...)

{

...

__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];

__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];

...

}

The other is the SolbelFilter in the sample projects of SDK. It declares the shared memory array at the start of the file, outside any functions:

extern __shared__ unsigned char LocalBlock[];

also when calling the device function from host, it allocates explicitly the dynamic shared memory space for each block:

SobelShared<<<blocks, threads, sharedMem>>>...

I want to know what are the differences between these two kinds of methods. Why must the shared memory array be “external” in the second case?

Any relevant information is appreciated.

DenisR · December 6, 2007, 9:57pm

In the second case, the amount of shared memory can be decided on at runtime. The first case is determined at compile-time.

ppofbb · December 7, 2007, 12:41am

Thanks!

Topic		Replies	Views
shared memory declaration CUDA Programming and Performance	2	3249	September 21, 2009
shared memory declaration CUDA Programming and Performance	2	973	September 21, 2009
__shared__ CUDA 9.0 programming Guide v. 0_Simple/matrixMul.cu CUDA Programming and Performance	5	882	December 17, 2017
Use shared Memory CUDA Programming and Performance	3	432	December 26, 2019
Several "extern __shared__" statements on a code CUDA Programming and Performance	2	1266	March 17, 2010
Shared Memory Allocation for CC 1.3 CUDA Programming and Performance	1	928	May 12, 2011
A question of using shared memory CUDA Programming and Performance	5	5378	March 12, 2008
shared memory CUDA Programming and Performance	3	1539	June 14, 2011
shared memory CUDA Programming and Performance	1	1142	February 12, 2009
Correct Use of Shared Memory? CUDA Programming and Performance	1	712	January 6, 2010