Use of constant memory breaks with OpenCL 1.1 Constant memory worked fine in 1.0, but fails in 1.1

While I’m still investigating the problem, I have noticed that using constant memory in OpenCL 1.1 is broken, or at least for the way that I had it implemented. I’m using the 258.19 driver and the 3.1 sdk, from the developer channel.

Previously, a kernel initiated with the following code worked:

__kernel void kernel_fdk(

	__global float *dev_vol,

	__read_only image2d_t dev_img,

	__constant float *dev_matrix,

	__constant float4 *nrm,

	__constant float4 *vol_offset,

	__constant float4 *vol_pix_spacing,

	__constant int4 *vol_dim,

	__constant float2 *ic,

	__constant int2 *img_dim,

	__constant float *sad,

	__constant float *scale,

	__constant int4 *ndevice,

	int offset

)

However, now the only way to get it working is to declare all the constants as global. I realize that I’m using more constant memory types than allowed, (10 vs the 9 that my 250 GTS supports), but even reducing the number of constant arguments does not solve the problem. I even tried changing the vector data types to arrays and that didn’t solve the problem either. On the host, I’m creating a buffer and then writing data to it, in two separate actions. I realized that the SDK no longer has an example that followed that method, and thus used copy host pointer to copy the data upon creation. The failed too. Basically, any use of constant memory seems to fail.

Is there anything wrong with what I’m doing in accordance to OpenCL 1.1?

I noticed this problem too!
Anyway it seems not to be related to OpenCL 1.1, it happened to me upgrading from sdk 3.0 to 3.1 (upgrading also the devdriver to the 256.40 version for Linux 64-bit).

The compiler forced me to declare as __constant all the “global scope” variables (this is not a problem apart from the fact that the number of __constant variables is limited). After this modification my kernels compiled again, but the software no longer worked and after two days of debugging I discovered that all the content of __constant arrays used inside my kernels are messed up with apparently random data.

Declaring them as __global let the kernels work again (although slower), are these behaviors reported/explained somewhere? I could not find anything in the release notes.

Other problems maybe related were reported here:
http://forums.nvidia.com/index.php?showtopic=167021
http://forums.nvidia.com/index.php?showtop…p;#entry1073729

Various months ago also here:
http://www.khronos.org/message_boards/view…f=37&t=2148

This is how it should have always been. They are just more strict with enforcing specs now, a good thing.

If thats the case, how do you use constant memory?

I’m not sure if we mean the same thing.

I meant this:

//kernels.cl

__kernel void k1(...)

{

...

}

__global float gArray[1000]; //this should be invalid! program scope variables must be constant

__constant float cArray[] = { 1,2,3,4, }; //should be OK, no way to change content from host or device, must be compiled in

That’s how I understood “global scope”. You were probably referring to why you have to start passing arguments to kernel as globals and not constants - I don’t know.

Yes, it is a good thing and I’m happy with this, also because the compiler warned me about the problem and i could fix it in few seconds… the real problem is “the mess up” of the content of __constant variables used in the kernels caused too by the driver upgrade (with no errors or warnings from the compiler or at runtime).

Anyway the problem disappeared in my implementation, I want to do some more tests to understand why and then I’ll post here the solution…

SOLVED:

Ok, so, the problem is that declaring all the global scope variables as __constant I exceeded the maximum number of possible __constant variables and when this happens the content of the __constant variables will be random… but the situation is not so simple and I am not so stupid (not too much I hope) ;)

I did not think about I could exceed the maximum number of possible __constant variables, since I was taking into account only global scope variable and the __constant ones used in may kernel. It is needed instead to take into account all the __constant variables declared in the *.cl file, also the ones inside kernels that are not used and moreover, every declaration of a pointer to a __constant memory space has to be taken into account too.

In simpler words, you have to count all the times that you use the __constant keyword in the *.cl source file and this number has to be less than the maximum number of possible __constant variables supported by the device in use.

My opinion:
Ok, nothing wrong with this restriction, it has to be forced by hardware design I guess, but I think that there should be some warnings or errors… from the JIT compiler or at runtime (cl_out_of_resources or something like that), I think it has not to be possible to run a code that try to use more than the available __constant variables if it is known that in this condition it became completely useless.
NVIDIA, could you fix this? …Thanks :)

Gotcha, I realize that I wasn’t really thinking when I responded last time.

As seen in my kernel, I want to have these constant declarations in the scope of each individual kernel. Is this no longer allowed (the compiler didn’t seem to have any problem with it) or is it just broken in the latest driver/sdk implementation?

It is allowed, but you have to pay attention to not use more __constant variables than allowed from the specifications of your board, as I described in my last post; otherwise strange behaviors may happen.