solutions to local memory limit

sachintha · May 11, 2012, 8:12am

Hi,
This is a question about limitations of local memory and solutions to overcome it.

Though it might not be relevant, I will briefly give the background to my particular application.
I’m in the process of porting a sequential C code to cuda in order to parallelise. The application
is basically a special case of particle filtering.
Loosely speaking what I’m trying to achieve is,
Given a set of N samples(particles) each associated with a particular weight, do some processing on each particle to come up with an updated
weight set, which is then used to generate the next set of samples.

So, I see a straightforward case of parallelising by processing each particle parallely.

However processing each particle requires quite a bit of computing and memory.
When I try to compile the code I get the following error
“ptxas info : Compiling entry function ‘Z5readSPdS_S_PiS0_S0_S0_S_S_S_S0_S_S_S_S_S_S0_S_S_S’ for ‘sm_13’
ptxas info : Used 117 registers, 29936+0 bytes lmem, 160+16 bytes smem, 144 bytes cmem[0], 220 bytes cmem[1], 20 bytes cmem[14]
ptxas error : Entry function ‘Z5readSPdS_S_PiS0_S0_S0_S_S_S_S0_S_S_S_S_S_S0_S_S_S’ uses too much local data (0x74f0 bytes, 0x4000 max)”

I think this is caused by some memory structures exceeding the local memory.

So my first question simply boils down to what are the potential solutions to this problem? (I don’t mind a bit of a compromise in efficiency)

This forum thread suggests
allocating enough memory from host and using an offset (block id etc) to divide the memory among parallel blocks.
Now on to my second question, how do you allocate memory from host so that I can use the solution given in the thread?
Is it simply allocating using cudaMalloc within a host function and then passing the pointers to device functions?

regards,
Sachintha.

Topic		Replies	Views
Local memory limit? CUDA Programming and Performance	1	12126	April 28, 2008
questions on register, local memory and block CUDA Programming and Performance	5	4891	February 28, 2008
uses too much local data How to deal with such problem ? CUDA Programming and Performance	2	15635	May 8, 2011
Constant and local memory problem CUDA Programming and Performance	5	925	May 9, 2011
Local Memory management inside device function (__global__ and __device__) CUDA Programming and Performance	6	5476	February 10, 2009
Loops cause too much local data error. Trouble processing large arrays in global memory on a kernel. CUDA Programming and Performance	1	822	March 30, 2011
Reduce local memory usage Jetson TX1	2	1330	August 17, 2016
ptxas error : Entry function '_Z15explore_subsetsP3curS0_PbPiS2_PfP17curandStateXORWOW' uses too m CUDA Programming and Performance	1	1419	October 6, 2014
How fast is local memory? the doc doesn't say much CUDA Programming and Performance	24	8214	August 20, 2007
Out of memory when allocating local memory CUDA Programming and Performance	4	821	January 4, 2023

solutions to local memory limit

Related topics