I am currently studying a paper on histograms, where of course they want to maximize the amount of shared memory they can use ( makes the number of bin they can have higher).
This led me to a question I did not find answer around.
Let say my card has N Kb of shared memory per SM, and the card can map at most M blocks per SM at the time.
My question is, meanwhile writing my kernel should I program thinking I have N Kb of memory available per block, and if there is not enough shared memory available for multiple blocks to be mapped, the driver won’t just map them, or the effective shared memory I can use is N/M kb per block?
Any info on the matter would be really appreciated.