A question of using shared memory

Hella_Yu · March 11, 2008, 7:13pm

I studied matrix-matrix multiplication example, and know how shared memory is explicitly stated in kernel function in tiled implementation for matrix-matrix multiplication .
Now I have some confusion of the shared memory usage .

If I did not explicitly state any shared memory usage in kernel function, somehow the .cubin file shows that still a small amount of shared memory is used for each block. why? and what 's that shared memory is used for?
In CUDA programing guide, it’s said that in <<<Dg, Db, Ns>>>, Ns is the size of dynamically allocated shared memory. So anybody could give an example of in which case we need to dynamically allocate shared memory? and what’s the difference of dynamic/static? Also, its said that such dynamic memory is used by any variables declared as an external array… I got confused of why ‘external array’, aren’t those variables in shared memory supposed only accessible to all threads within a block?

Sorry if I make the questions confusing … any ideas of understanding this is appreciated. Thanks:)

MisterAnderson42 · March 11, 2008, 8:15pm

blockDim, gridDim and kernel arguments are passed into shared memory.
Say you need 1 float of shared memory for each thread in the block, but you call your kernel with different block sizes. Then you need to use Ns to specify the amount of dynamic shared memory to allocate for each kernel run. The only difference between the dynamic and static is that whether the amount of shared memory allocated per block is determined by the compiler or the caller of the kernel.

See the code examples for the external array bit. You declare a dynamic shared memory array like this: “extern shared float;” It seems an odd syntax, but it does make sense as the shared array is technically defined external to the compilation unit.

Hella_Yu · March 11, 2008, 8:29pm

Thanks a lot! It helps clarifying my questions.

I’m not sure which example you are talking about, I cannot find ‘external array bit’ in cuda code samples. Could you kindly give the exact name or link? Thanks!

MisterAnderson42 · March 11, 2008, 8:45pm

I was referring to the CUDA programming guide, section 4.2.2.3 where it exactly has: “extern shared float shared;”

You can also probably find some examples of this in the SDK samples, though I’m not sure which ones might use dynamic shared memory. You can always grep the SDK directory for it “extern shared”.

Hella_Yu · March 12, 2008, 5:01pm

Thanks for the reply…

I read from programing guide, section 4.2.2.2, that for variables declared in shared memory as an external array “extern shared float shared”, the size of the array is determined at launch time.

Here I’m confused of " determined at launch time", does that mean shared memory size per block is unknown before the program starts running? Then how compiler decides number of parallel blocks and whether they can fit into the stream multiprocessor (like what occupancy calculator does)?

Thanks.

-Y

MisterAnderson42 · March 12, 2008, 5:22pm

You answered this question for yourself in your original post

Hence the compiler does not determine the amount of extern shared memory, your program does in software and passes that to the driver when launching the kernel.

Topic		Replies	Views
Need for dynamic allocated shared memory? CUDA Programming and Performance	2	3596	March 4, 2011
Shared Memory - Dynamic Allocation CUDA Programming and Performance	2	21485	November 21, 2008
__shared__ CUDA 9.0 programming Guide v. 0_Simple/matrixMul.cu CUDA Programming and Performance	5	966	December 17, 2017
extern __shared__ does not allocate memory CUDA Programming and Performance	1	7526	December 1, 2009
Different ways of using Shared memory Question about programming shared memory CUDA Programming and Performance	2	2324	December 7, 2007
[Question]: extern keyword about CUDA dynamic shared memory CUDA Programming and Performance	3	173	July 21, 2025
dynamic array in shared memory CUDA Programming and Performance	2	1988	October 16, 2015
Shared Memory questions CUDA Programming and Performance	6	10339	September 2, 2010
Dynamic memory allocation CUDA Programming and Performance	4	2982	July 11, 2007
extern CUDA Programming and Performance	1	1489	December 17, 2009

A question of using shared memory

Related topics