Local Arrays and Dynamic Parallelism

Dear All

 I want to do Dynamic Parallelism in a K40. I have the bellow code. This code gives two times the following message:

Error: a pointer to local memory cannot be stored into the parameter buffer, obtained at…/mix.cu(1649)

The error gives in line 25 in the bellow code.

Does anyone know How I have to proceed to not allocate “FR” and “coef” in the main GPU memory?
Because if I allocate FR and coef in the main GPU memory the performance degrades very much.

Thanks

Luis Gonçalves

__global__ void process1(int nusersvirt, int iter,float *real_codigo,float *imag_codigo,complex1 *out1,
	float *fft_in_real,float *fft_in_imag,int nredund,float *eqin, int varu,int baixo,int cima)
{

	int banda1=threadIdx.x+blockIdx.x*blockDim.x;
    int brwidth=blockDim.x*gridDim.x;
    int i,form1,INFO=0,i1;
    float d1,d2,d3,d4;
    complex1 m1, m2,m5,m6;
    complex1 a[171];
    complex1 RO[18];
    int form,z,z1,banda4,banda5;
    int banda3,ind2;
    int shift=0;
    int index1;
  
    	complex1 FR[(MAXUSERS-1)*MAXREDUND];
   	   float coef[MAXREDUND];
	

	   banda3=nredund*(nusersvirt-1);

   fr1<<<banda3/32+(((banda3 % 32)==0) ? 0:1),32>>>(FR,coef,eqin,real_codigo, imag_codigo, varu, baixo, cima, nusersvirt,iter,banda1,SYMB);

http://devblogs.nvidia.com/parallelforall/cuda-dynamic-parallelism-api-principles/

Do a search on the page for “local memory” and you’ll see that it cannot be passed to a child kernel.

If you want to improve performance, make your sure reads from global memory are coalesced.