Max shared memory

Gregory_Diamos · December 3, 2008, 5:46am

I’m working with an algorithm that recursively splits an array into smaller sections until they will fit into shared memory and then processes them in shared memory. Small sections must be powers of 2.

I’ve noticed that a bit of shared memory seems to be reserved when a kernel is called. For example, I get out of memory errors when doing something like this:

int device;

	cudaDeviceProp properties;

	

	cudaGetDevice( &device );

	cudaGetDeviceProperties( &properties, device );

	foo<<< grid, block, properties.sharedMemPerBlock >>> ( );

Subtracting a small amount (~100 bytes) from the specified shared memory works. My algorithm requires powers of 2 for the shared memory size, so by subtracting anything from the total shared memory on a specific card, I have to drop down to half as much.

Is there any way to reclaim this memory?

I’ve looked around in the PTX assembly and didn’t see any extra declarations so I figure that it is being either reserved by the JIT or the runtime. Is there a flag I can give to either of these to use global or local memory instead of shared? If not, will I see any consequences if I just allocate slightly less and clobber a small part of shared mem?

SPWorley · December 3, 2008, 8:42am

If I remember right, shared memory is used to store thread/block IDs as well as kernel parameters. It’s not much, but it does steal those bytes from your total. If you search on this forum you might find a discussion about a year ago where someone figured out those extra values and their position and padding. There’s likely no way around the loss…

One way to help might be to split your data a few more times to allow multiple simultaneous block execution. You can’t use a block of 16K, but perhaps 7 blocks of 2K would work… it depends on whether your algorithm would be more efficient as 7 blocks of 2K or 1 block of 8K.

E.D_Riedijk · December 3, 2008, 1:25pm

According to page 28 of nvcc2.[01].pdf thread/grid index information is stored in local memory. This is also different from what I thought…

Topic		Replies	Views
Some confusion on using shared memory. CUDA Programming and Performance	26	9286	June 2, 2009
Shared memory: released when unneded? CUDA Programming and Performance	4	3197	July 25, 2008
where is the another 32 byte shared memory CUDA Programming and Performance	2	6067	July 21, 2009
how can i free(delete) arrays in shared memory?? Legacy PGI Compilers	6	6723	February 16, 2012
shared memory exact usable size 16kb less 256?? CUDA Programming and Performance	9	1011	November 3, 2010
How to use all 16KB shared memory CUDA Programming and Performance	39	19546	April 1, 2010
Strange ptxas error in shared memory CUDA Programming and Performance	7	9222	February 24, 2009
cudaError_enum error with a lot of shared memory CUDA Programming and Performance	2	5076	March 22, 2007
Where best to allocate memory On the local stack or in shared memory CUDA Programming and Performance	11	5539	January 26, 2009
shared memory usage by nvcc CUDA Programming and Performance	0	2515	September 14, 2008

Max shared memory

Related topics