why does performance scale with allocated shared memory size?

sleap · May 3, 2013, 4:14am

Hi,
I’m quite surprised to see a kernel take much longer to execute when allocating more shared memory, even if the memory is never used. Can anyone explain what’s going on? Is this expected and is it a hardware limitation?

SPWorley · May 3, 2013, 4:36am

Shared memory is a limited resource. Multiple blocks can run simultaneously on the same SM, but only if the SM has enough resources for all of them. (including shared memory, registers, and total warp count.) By specifying a larger shared memory per block, the number of simultaneous blocks that can run at once drops, and therefore your occupancy drops. This made the SM have too little work to do, wasting its compute resources.

Tuning block configuration is part of designing your kernel’s launch configuration. There’s a lot of discussion in the CUDA programming guide, and an occupancy calculator.

Topic		Replies	Views
Will more shared mem hurt performance? CUDA Programming and Performance	2	1283	October 27, 2008
shared memory problems size of shared memory allocated affects execution time? CUDA Programming and Performance	2	799	June 20, 2011
Find the limit of shared memory that can be used per block CUDA Programming and Performance	2	14132	March 17, 2017
shared memory usage per Block VS per SM CUDA Programming and Performance	2	8617	May 3, 2010
shared memory and CUDA calculator CUDA Programming and Performance	6	4140	October 26, 2008
Occupancy is not like I expected CUDA Programming and Performance	4	587	June 29, 2020
Amount of Shared Memory CUDA Programming and Performance	10	4369	June 3, 2010
Is the time of allocating the shared memory dependent on the shared memory size? CUDA Programming and Performance	2	688	March 24, 2013
Amount of usable shared memory? CUDA Programming and Performance	2	2270	May 31, 2012
Why is shared memory configuration size is limiting the occupancy CUDA Programming and Performance kernel , profiling	2	1207	June 4, 2023

why does performance scale with allocated shared memory size?

Related topics