uses too much local data

I have a matrix[120000]
error : Entry function ‘Z9addKernelPiPKiPdS2_S2’ uses too much local data (0xea600 bytes, 0x4000 max)

someone know how to solve this problen? thanks


global void addKernel(int *c, const int *f,double *GU2,double *GV2,double *GV22)

int i = blockIdx.x*blockDim.x + threadIdx.x;
double GU22[120000];
		c[i] = f[i] ;
	    GU22[i] = GU2[i] ;
		GV22[i] = GV2[i] ;


This is the problem:

double GU22[120000];

120,000doubles*8bytes/double = 960000 bytes = over 900 Kbytes

The amount of local data per thread is limited based on the GPU:

for CC 1.x it is limited to 16Kbytes per thread. For other devices it is limited to 512Kbytes per thread. Based on the exact error you are receiving, it looks like you are compiling for a cc1.x device. One way or another you’ll need to reduce the size of that GU22 declaration. A straightforward approach would be to allocate and locate this in global memory instead of local memory.

If you can switch to a cc2.x or newer device, you should be able to use about half that much local memory, e.g.:

double GU22[60000];

While we’re at it, this combination of statements:

double GU22[120000];
GU22[i] = GU2[i] ;

Seems like it could allow for out-of-bounds accesses to GU22, depending on i (i.e. grid size)

To txbob

      hello,  this program is image processing, and  i have 120000 pixel 
      and i don't what is cc2.x  would you explain cc2.x?

Hello i search my gpu
i use GeForce GTX 660 3.0
i use VS2013&cuda5.5

You only need that much local storage (local to a single thread) if you are processing all 120000 pixels in a single thread. Probably you don’t want to do that. Even if you did want to do that, it won’t work, so locate that array in global memory instead.

If you have no idea what I’m talking about, then I suggest you try some of the educational resources to learn CUDA. Such as here: