Lots of blocks, shared memory question

fierval · April 20, 2015, 2:59am

Suppose I intend to run the total of 1,000,000 threads. I have 1D blocks, blockDim = 512. This will yield gridDim = 1954.

Now, each block allocates 1K of shared memory. I have 14 SMs. Since the limit for my GPU is 48K per SM, with this many blocks, it is conceivable that 139 may be allocated per SM, meaning that I may end up violating the max shared memory limitation.

Is this conclusion correct? Or are there other factors I am not taking into account?

Robert_Crovella · April 20, 2015, 3:14am

There are other factors you’re not taking into account.

Each SM has a limit on the number of threadblocks (lower than 139) that can be resident as well as a limit on the number of threads that can be resident (usually 1536 or 2048, depending on GPU). So the thread limitation would prevent more than 3-4 of your 512-thread threadblocks from being resident on an SM at any given time. New threadblocks would not become resident until previous ones had finished, and released their shared memory allocation. So given your scenario it appears that no more than 4K out of 48K of shared memory would be in use on any SM at any given time.

Furthermore, threadblocks are not issued to SMs until there are sufficient resources of all types necessary to support that threadblock. So even if your threadblocks were using, say, 32KB of shared memory, that just means that shared memory would become the limiting factor, and no more than 1-3 threadblocks would be resident on an SM, at any given time, due to shared memory resource limits.

fierval · April 20, 2015, 4:43am

Ok. This clarifies it. Thank you very much.

Topic		Replies	Views
shared memory allocation among thread blocks CUDA Programming and Performance	3	1910	March 3, 2008
A little quire on shared memory. CUDA Programming and Performance	0	455	October 17, 2017
maximum number of blocks CUDA Programming and Performance	3	2446	April 10, 2008
shared memory usage per Block VS per SM CUDA Programming and Performance	2	8616	May 3, 2010
Not enough shared mem CUDA Programming and Performance	5	5883	November 3, 2009
Shared memory CUDA Programming and Performance	2	6923	April 14, 2011
Usage of shared memory CUDA Programming and Performance	12	429	February 15, 2025
Optimisation Strategies when running out of shared memory CUDA Programming and Performance	1	596	March 12, 2011
shared memory and CUDA calculator CUDA Programming and Performance	6	4136	October 26, 2008
Thread Scheduling / Limit maximum threads per block in each dimension vs Maximum thread on a SM CUDA Programming and Performance	3	1812	June 21, 2012

Lots of blocks, shared memory question

Related topics