Optimisation Strategies when running out of shared memory

Yordan_Zaykov · March 12, 2011, 12:05am

Hey guys,

So what is my best bet if I cannot fully utilise my GPU because I don’t have enough shared memory to launch enough threads per block? Say each task needs 2 KB and therefore I can run at most 8 threads per block (assuming each thread takes care of one task and 16 KB of shared memory is available per block).

SPWorley · March 12, 2011, 1:36am

There’s no way to answer without breaking down your problem to understand what it’s doing and how it could be done.

But an immediate observation: if you say each thread needs 2KB of shared memory, your approach doesn’t map at all well to the GPU. You want to use not 8 threads per block (or per SM), but more like
256 or preferably 512, and then run hundreds of those blocks.

Alternatively, take one of your thread’s tasks (and its 2K of data) and figure out how you might use at least 32 threads to solve that problem in parallel. Then you can have each warp of a block work on what you currently have each thread working on. But this gets into the whole “it depends what your problem is” conclusion.

Topic		Replies	Views
Lots of Threads vs. Shared Memory CUDA Programming and Performance	9	8350	February 12, 2008
The choose of grid size and block size CUDA Programming and Performance	8	2408	May 8, 2024
Execution Of Thread-Blocks CUDA Programming and Performance	4	5281	June 18, 2007
Is this a good match for GPU? CUDA Programming and Performance	5	3613	June 11, 2009
maximum threads per block not always used CUDA Programming and Performance	2	753	June 14, 2018
Shared memory and register usage - just 1 thread/block CUDA Programming and Performance	1	793	July 21, 2009
Not enough shared mem CUDA Programming and Performance	5	5761	November 3, 2009
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27654	February 15, 2010
Maximum of threads On 8600GT CUDA Programming and Performance	6	3569	April 9, 2008
Expanding shared memory into global memory? CUDA Programming and Performance	3	1538	August 3, 2009

Optimisation Strategies when running out of shared memory

Related topics