Per thread local memory Per thread local memory specified in C Programming Guide

nkumar · March 6, 2012, 5:38pm

I have a question:
The CUDA C Programming Guide specifies for CC 2.x devices “per thread” local memory as max 512 KB.
Does it mean that this is the max amount of memory that can be allocated per thread ? In other words is there a max 512 KB memory available to allocate automatic variables (including both scalar and array) created on the stack per thread ?

njuffa · March 6, 2012, 6:21pm

Correct. Given the large number of concurrent threads (up to 1536 per SM, times the number of SMs on the device which can be up to 16) this means that it is possible to allocate a significant portion of the total GPU memory as thread-local memory. Note that by default the driver allocates a much smaller amount of per-thread local memory, and adjusts the bound upward as needed. Programmers can also set the amount explicilty via cudaDeviceSetLimit (cudaLimitStackSize, ).

Topic		Replies	Views
Size of local memory CUDA Programming and Performance	2	2901	February 25, 2008
Local Memory Per Thread ? CUDA Programming and Performance	5	4366	June 4, 2010
Thread Local Memory CUDA Programming and Performance	1	6901	January 26, 2016
Thread-local memory address CUDA Programming and Performance	5	2339	December 8, 2021
Local memory size CUDA Programming and Performance	8	7698	November 14, 2008
Local memory usage CUDA Programming and Performance	5	1199	July 9, 2010
How to decide the optimal block size in CUDA CUDA Programming and Performance	4	27700	February 15, 2010
What is the maximum CUDA Stack frame size per Kerenl. CUDA Programming and Performance	1	13469	November 18, 2013
Out of memory when allocating local memory CUDA Programming and Performance	4	776	January 4, 2023
Not enough shared mem CUDA Programming and Performance	5	5765	November 3, 2009

Per thread local memory Per thread local memory specified in C Programming Guide

Related topics