Find the limit of shared memory that can be used per block

giordi91 · March 17, 2017, 2:12am

Hei there!
I am currently studying a paper on histograms, where of course they want to maximize the amount of shared memory they can use ( makes the number of bin they can have higher).
This led me to a question I did not find answer around.

Let say my card has N Kb of shared memory per SM, and the card can map at most M blocks per SM at the time.
My question is, meanwhile writing my kernel should I program thinking I have N Kb of memory available per block, and if there is not enough shared memory available for multiple blocks to be mapped, the driver won’t just map them, or the effective shared memory I can use is N/M kb per block?

Any info on the matter would be really appreciated.

M.

Robert_Crovella · March 17, 2017, 2:58am

This generally a tradeoff which may affect occupancy which may affect performance.

Briefly, if you used the maximum of 48kb per block, then you would have a max occupancy of 1 block per SM. This might not be the most performance, so using less per block (e.g. 32kb, or 16kb) might yield substantial increases in performance.

The CUDA occupancy calculator may be interesting to experiment with.

giordi91 · March 17, 2017, 3:14am

Yes, I was able to guess that from the paper, meaning that occupancy was going to be a problem, the thing was if the kernel would out right crash saying I am asking too many resources or will be the driver to manage that, mapping less blocks per sm, with less occupancy that is, but your answer actually checks my point, the driver will take care of it so unless each block doesn’t ask for more than the max shared memory I wont get a crash.
Thanks a lot for your answer.

M.

Topic		Replies	Views
Shared memory CUDA Programming and Performance	2	6853	April 14, 2011
Question about max shared memory in block and multiprocessor CUDA Programming and Performance	2	604	February 20, 2024
shared memory and CUDA calculator CUDA Programming and Performance	6	4033	October 26, 2008
shared memory usage per Block VS per SM CUDA Programming and Performance	2	8536	May 3, 2010
Shared memory per block Related to shared memory of an MCPU CUDA Programming and Performance	3	3978	August 14, 2007
Maximising memory per thread CUDA Programming and Performance	4	3274	May 3, 2010
NEWBIE:max size of shared memory of a block? CUDA Programming and Performance	3	3096	September 5, 2009
How to fix the maximum number of active blocks per SM CUDA Programming and Performance	3	10439	November 11, 2011
why does performance scale with allocated shared memory size? CUDA Programming and Performance	1	651	May 3, 2013
maximum number of blocks CUDA Programming and Performance	3	2376	April 10, 2008

Find the limit of shared memory that can be used per block

Related topics