constant memory problem

Nico · January 27, 2010, 4:07pm

I’m facing a weird problem when using constant memory (see attached template project)

In my .cu file, constant memory is defined as:

constant float4 f1[8];
constant float4 f2[8];
constant float4 f3[8];
constant float4 f4[8];
constant float4 fd[4];

of which only fd is used in the test kernel.
When compiled like this, many tests fail.

If I write it as:
constant float4 fd[4];
constant float4 f1[8];
constant float4 f2[8];
constant float4 f3[8];
constant float4 f4[8];

then all tests pass.

If I comment one or more of [f1,f2,f3,f4], then all test pass also.

I’m using cuda 3.0 beta SDK and driver 195.30 on 64bit fedora 11. Same issue on 64 bit ubuntu 9.10

I also tried different drivers, but that didn’t help. Maybe it’s a cuda 3.0 beta or a pebcak problem :-)
I’d really appreciate it if someone could tell me why the first case doesn’t work.

Best regards,
Nico
template.zip (4.68 KB)

avidday · January 27, 2010, 4:13pm

Without looking at your code, my guess is that you have an out of bounds index somewhere to that fd constant memory array. When you declare other constant memory storage after it, your code is probably reading out of bounds, but into something “safe”, so no errors occur. When you declare fd last, it sits at the end of your legal constant memory block, which makes you code read out of bounds into somewhere it isn’t allowed, and it fails.

Just a theory mind you…

Nico · January 27, 2010, 4:26pm

Thanks for the reply avidday.

I’ve dealt with my fair share of out-of-bounds problems, so I already checked that :)

I’m accessing the constant memory in my kernel without a variable index, so it’s either

fd[0],fd[1],fd[2] or fd[3]

The other constant memory allocations aren’t used, they’re just there to illustrate the ‘bug’, so I guess it can’t be an out-of-bounds issue.

Cheers,

Nico

eyalhir74 · January 28, 2010, 8:23am

The problem is probably here:

cudaMemcpyToSymbol ("fd", tmpd, 4*sizeof(float4), 0, cudaMemcpyHostToDevice );

It should be fd without the quotes (“”):

cudaMemcpyToSymbol (fd, tmpd, 4*sizeof(float4), 0, cudaMemcpyHostToDevice );

Also this is also weird (altough probably working):

float4 tmpd[4];

	int* tmpdi = (int*)&(tmpd[0].x);

	tmpdi[ 0] = 0x424d49f5;

	tmpdi[ 1] = 0x40c9d0ff;

	tmpdi[ 2] = 0x3f208ee8;

	tmpdi[ 3] = 0x3e134f49;

	tmpdi[ 4] = 0x424a97e3;

	tmpdi[ 5] = 0x40e91b65;

	tmpdi[ 6] = 0x3f1f1d80;

	tmpdi[ 7] = 0x3e134f49;

	tmpdi[ 8] = 0x424e0a39;

	tmpdi[ 9] = 0x40cdd1c5;

	tmpdi[10] = 0xbe6ecfdc;

	tmpdi[11] = 0x3e134f49;

	tmpdi[12] = 0xc24ca2ff;

	tmpdi[13] = 0xc0e93cea;

	tmpdi[14] = 0xbeeb9dc7;

	tmpdi[15] = 0xbe114300;

Why don’t you just access tmpd[0].x = …, tmpd[0].y = …, … tmpd[3].z = … ??

eyal

iceberg · January 28, 2010, 9:27am

Nico , I tested your code compiled with CUDA 3.0 beta on windows 7 OS.

Indeed, it’s very weird. When changed the declaration “constant float4 fd[4];” position, I got different results.

I compared the .ptx files, but they are almost same except the declaration “constant float4 fd[4];” position.

I simplified the kernel function as following, but still I got wrong result:

__global__ void testKernel( float d, float* g_odata, int imageW, int imageH) 

{

	const   int ix = IMAD(blockDim.x, blockIdx.x, threadIdx.x);

	const   int iy = IMAD(blockDim.y, blockIdx.y, threadIdx.y);

	g_odata[IMAD(iy, imageW, ix)]=  d*fd[3].z + d*fd[3].w;

}

And, if I changed the last expression in this kernel function as following, I got correct result:

__global__ void testKernel( float d, float* g_odata, int imageW, int imageH) 

{

	const   int ix = IMAD(blockDim.x, blockIdx.x, threadIdx.x);

	const   int iy = IMAD(blockDim.y, blockIdx.y, threadIdx.y);

	g_odata[IMAD(iy, imageW, ix)]= d*(fd[3].z + fd[3].w);

}

Nico · January 28, 2010, 9:36am

Hi eyal,

The problem is not located in the cudaMemcpyToSymbol call. According to the reference manual you can pass either a character array or the parameter name.
I double checked it be uploading the data to GPU and then downloading it again to an array initialized to all zeroes and the returned data is correct.

The reason I’m allocating the float constant array as ints is because I wanted to fill the array in this trimmed down template project with the exact same values as my larger project where I first encountered this bug.

Cheers,
Nico

Nico · January 28, 2010, 9:41am

Thanks a lot for performing the tests iceberg.

It’s a weird problem indeed. My best guess is that it’s some sort of optimization bug.

When I change

[codebox]

ftmp.z = afd[0].z+bfd[1].z+cfd[2].z+dfd[3].z;

ftmp.w = afd[0].w+bfd[1].w+cfd[2].w+dfd[3].w;

[/codebox]

to

[codebox]

ftmp.z = afd[0].z+bfd[1].z+cfd[2].z+1.00001fd*fd[3].z;

ftmp.w = afd[0].w+bfd[1].w+cfd[2].w+1.00001fd*fd[3].w;

[/codebox]

then it passes my epsilon tests in all cases, even with fd in the last position.

N.

iceberg · January 29, 2010, 1:35am

I agree with you. I believe it’s a nvcc compiler 3.0 beta related bug.

Topic		Replies	Views
problem with constant memory Legacy PGI Compilers	4	4239	September 7, 2011
Problem setting __constant__ memory CUDA Programming and Performance	4	4182	May 23, 2007
__constant memory issues CUDA Programming and Performance	32	12929	September 15, 2010
constant memory usage problem unexpected behavior using constant memory CUDA Programming and Performance	15	9841	April 1, 2009
Constant Memory problem CUDA Programming and Performance	12	1541	January 30, 2015
__const__ writting with cudaMemcpyToSymbol CUDA Programming and Performance	15	11009	August 30, 2007
Tables not correct when using __constant__ CUDA Programming and Performance	12	14983	July 28, 2010
Constant memory allocation and initialization CUDA Programming and Performance	12	83035	November 20, 2010
Why is my __constant__ memory float array sometimes all zeros? CUDA Programming and Performance	3	876	May 30, 2019
I had a problem on using constant memory CUDA Programming and Performance	4	942	October 13, 2013

constant memory problem

Related topics