Constant Memory problem

luisgo · January 29, 2015, 4:32pm

Dear All

http://cuda-programming.blogspot.pt/2013/01/what-is-constant-memory-in-cuda.html

I am having problems using constant memory. I done as the link above (complete example). The only difference is that I passed the constant memory pointer (or offset of it) as a parameter to inside the kernels and use it inside. I am getting wrong results contrary to before done that. My processing capability is 3.0 and 3.5.

__constant__ __device__ unsigned char input1[65512]; //for one antenna, must be allocated for more antennas

void main()
{

unsigned char frameori[65512];

//fill frameori with data

cudaMemcpyToSymbol((void *)input1, (void *)frameori, 512+sizeof(float)*(NRSAMPLES*2*NUMBEROFANTENNAELEMENTS+NRSAMPLES*2*NUMBEROFANTENNAELEMENTS+SUBSET+NTAPS*NUSERS*NUMBEROFANTENNAELEMENTS*4)+sizeof(int)*NTAPS*NUSERS*NUMBEROFANTENNAELEMENTS,0, cudaMemcpyHostToDevice);

cudaDeviceSynchronize();

delaytran=(int *)(input1+512+sizeof(float)*(NRSAMPLES*2*nant+NRSAMPLES*2*nant+SUBSET+ntaps*nusers*nant*2));
   	   tapreal=(float *)(input1+512+sizeof(int)*ntaps*nusers*nant+sizeof(float)*(NRSAMPLES*2*nant+NRSAMPLES*2*nant+SUBSET+ntaps*nusers*nant*2));
          tapimag=(float *)(input1+512+sizeof(int)*ntaps*nusers*nant+sizeof(float)*(NRSAMPLES*2*nant+NRSAMPLES*2*nant+SUBSET+ntaps*nusers*nant*3));

ciclo4<<<NRSAMPLES/32,32,0,stream[z5]>>>((complex1 *)(input1+512+sizeof(complex1)*(NRSAMPLES*nant)), timetotal,timeuser,comp1[z5],maxdelay+atraso,g,ntaps,nant,nusers,z5,
            		   tapreal,tapimag,delaytran);

}

__global__ void ciclo4(complex1 *frame1,complex1 *timetotal,complex1 *timeuser,complex1 *comp1,int maxdelayatraso,int g,
		int ntaps,int nant,int nusers,int z5,
		   float *tapreal,float *tapimag,int *delaytran)
{
	int i1=blockIdx.x * blockDim.x + threadIdx.x;
    complex1 const7,const8,const9;
    complex1 *inri;
	complex1 *user,*total;
	int nr,t1,delay2;

	const9.r=0;
	const9.i=0;
	      for(nr=0 ; nr < nant ; nr++)
    {
		  inri=frame1+NRSAMPLES*nr;
		  user=(timeuser+NRSAMPLES*(z5*nant+nr));
          total=(timetotal+NRSAMPLES*nr);

for(t1=0; t1 < ntaps ;t1++)
{
delay2=maxdelayatraso-delaytran[t1+g*ntaps+nr*nusers*ntaps];
if ((i1-delay2) >= 0)
{
const7.r=(inri+i1-delay2)->r-total[i1-delay2].r+user[i1-delay2].r;
 const7.i=(inri+i1-delay2)->i-total[i1-delay2].i+user[i1-delay2].i;
const8.r=*(tapreal+g*ntaps+t1+nr*nusers*ntaps);
const8.i=-*(tapimag+g*ntaps+t1+nr*nusers*ntaps);

const9.r+=const7.r* const8.r - const7.i * const8.i;
const9.i+=const7.r * const8.i + const7.i * const8.r;
}
}
} //NRANTENNAS
comp1[i1].r=const9.r;
comp1[i1].i=const9.i;
}

External Media

Thanks

Luis Gonçalves

Robert_Crovella · January 29, 2015, 4:40pm

don’t pass the constant memory pointer to the kernel. This is effectively violating the CUDA rule that host code cannot take the address of a device variable (or function).

The constant memory variable has module/translation-unit scope and can be used directly without passing it explicitly. If you need to use an offset version, pass the offset only, and add that offset to the variable (pointer) in the kernel code directly.

luisgo · January 29, 2015, 5:20pm

May I make pointer conversions inside kernels with the pointer of the __constant _ variable? eg

__constant__ unsigned char input1[1000];

__global__ void kernel()
{
    float a;

   a=*((float *)(input1+offset)+offsetfloat);

}

External Media

njuffa · January 29, 2015, 5:28pm

Yes, you can create a pointer pointing to a location inside the constant array in the device code. Here is an example from the CUDA math library (file math_functions_dbl_ptx3.h):

static __constant__ double __cudart_sin_cos_coeffs[16] =
{
  [...]
};

static __forceinline__ double __internal_sin_cos_kerneld(double x, int i)
{    
  const double *coeff = __cudart_sin_cos_coeffs + 8 * (i & 1);
  [...]
}

However, the code you show looks risky. All data on the GPU must be naturally aligned, that is the alignment must be a multiple of the size of each data item. Depending on the value of ‘offset’ in your code, the resulting pointer may not be suitably aligned to access a four-byte ‘float’.

luisgo · January 29, 2015, 5:50pm

“offset” is a multiple of “sizeof(float)”. Is it aligned? I hope so.

njuffa · January 29, 2015, 5:59pm

Note that it is also possible that input1 itself is not 4-byte aligned. It would probably be best to declare input1 as an array of ‘float’. You could also use an array of ‘uchar4’, or use the align attribute with the ‘unsigned char’ array.

luisgo · January 29, 2015, 7:21pm

The code bellow is giving the following error. I double checked and I think that the indexation of input1 is right.

cudaCheckError() failed at D:/zipback/user/cuda/kernel.cu:1115
: invalid device symbol

static __align__(4) __constant__ unsigned char input1[65512]; 

int main( int argc, char *argv[ ] )
{

for(z5=0;z5 < z7;z5++)
{
         	signature1<<<20,SYMB,0,stream[z5]>>>(comp1[z5],(framecod)+((z5 << 1))*(SUBSET),(framecod)+((z5 << 1)+1)*(SUBSET),real_codigo,imag_codigo,delaytran,amptran,fasetran,g,const1,ntaps,nant,nusers,z7,z5);
}
}


__global__ void	signature1(complex1 *comp1, float *pont1,float *pont2,float *real_codigo,float *imag_codigo,
		int delaytran, int amptran, int fasetran,int g,float const1,int ntaps,int nant,int nusers,int z7,int z5)
 {

	int nu,k,k1,nr,shift,tran1;
	float const5, const6,const7,const8,const9;

	k=blockIdx.x * blockDim.x + threadIdx.x;
   // k1=blockDim.x*gridDim.x;

	    comp1[k].r=0;
	    comp1[k].i=0;
	    	if (k>=HALFSUB){ k1=NRSAMPLES-SUBSET+k;

        	}
        	else
        		k1=k;
for(nr=0;nr<nant;nr++)
{
	const6=0;
	const7=0;
	        shift=g*ntaps+nr*nusers*ntaps;
	        tran1=SUBSET*(z7*nr+z5)+k;
      	    for(nu=0; nu < ntaps ; nu++)
      		{
      		    const5=(float)(k1)*const1*((float)(*((int *)(input1+delaytran+sizeof(int)*(shift+nu)))))+*((float *)(input1+fasetran+sizeof(float)*(shift+nu)));
      		    sincosf(const5, &const8, &const9);
      		    const5=*((float *)(input1+amptran+sizeof(float)*(shift+nu)));
				const6+=const5*const9;
      		    const7+=const5*const8;
      		}
                   *(real_codigo+tran1)=(const6* *(pont1+k)-const7* *(pont2+k));
                   *(imag_codigo+tran1)=(const6* *(pont2+k)+const7* *(pont1+k));
}

}

njuffa · January 29, 2015, 7:48pm

Which line in the above snippet corresponds to kernel.cu:1115 ? The code above is not a buildable sample code I could use to try and reproduce the issue. The compiler presumably also tells you the symbol name of whatever objects is thinks is not a device object. What is that symbol name, relative to the code you posted?

luisgo · January 29, 2015, 8:12pm

The error is a runtime error. I have some code that I get from internet to report errors. The code get last error.

The line is in after call kernel signature in main. Then the error is in signature kernel.

njuffa · January 29, 2015, 8:29pm

The text of the error message suggests it pertains to a cudaMemcpyToSymbol() or similar API call, not a kernel invocation.

luisgo · January 29, 2015, 9:31pm

You are right. See bellow the error code. The transfer size is less than 64K

cudaMemcpyToSymbol((void )input1, (void )frameori, 512+sizeof(float)(NRSAMPLES2NUMBEROFANTENNAELEMENTS+NRSAMPLES2NUMBEROFANTENNAELEMENTS+SUBSET
+NTAPSNUSERSNUMBEROFANTENNAELEMENTS4)+sizeof(int)NTAPSNUSERS*NUMBEROFANTENNAELEMENTS,0, cudaMemcpyHostToDevice);

njuffa · January 29, 2015, 9:52pm

I have never use cudaMemcpyToSymbol(), but this doesn’t look right to me:

cudaMemcpyToSymbol((void *)input1, (void *)frameori, ...);

From what I can tell from the documentation, the first argument should be just the symbol name:

cudaMemcpyToSymbol(input1, (void *)frameori, ...);

luisgo · January 30, 2015, 4:00am

Thanks

Topic		Replies	Views
constant memory usage problem unexpected behavior using constant memory CUDA Programming and Performance	15	9712	April 1, 2009
Problem with costant memory Can I define it as external CUDA Programming and Performance	7	1563	September 27, 2010
Cuda constant memory CUDA Programming and Performance cuda , kernel	5	3111	September 7, 2023
constant memory problem CUDA Programming and Performance	7	9708	January 29, 2010
Constant memory when having more than one file external does not work CUDA Programming and Performance	24	3228	August 27, 2010
__constant__ memory which is device-side only (avoiding cudaMemcpyToSymbol) CUDA Programming and Performance	9	9799	June 15, 2017
Constant Memory - When are variables cleaned? Multiple kernels for same constant... CUDA Programming and Performance	10	8877	January 22, 2008
issue with constant memory in different files invalid device symbol CUDA Programming and Performance	3	1963	June 18, 2010
copying structure to constant memory? CUDA Programming and Performance	9	8727	May 1, 2009
Constant memory allocation and initialization CUDA Programming and Performance	12	82710	November 20, 2010

Constant Memory problem

Related topics