segmentation faults when using __constant kernel args upgrade to GTX4xx and i get strange segfaults

So i have a few kernels, all worked well on my old Quadro FX 1700.
I installed an additional GTX 470 and tried to run the very same kernels on that.
Those tools that run a single kernel multiple times work, and are FAST.
Any tool that runs a kernel, that has a __constant argument (it’s __constant float * attr, a cl_float[10]) will segfault when running another kernel after that.
The first one works, gives output. The second one can even be a dummy-kernel, doesnt matter: segfault.

If i change the __constant attributes of the first kernel to __global, it works.
If i use the Quadro, it works.
Tried some examples from the SDK, like oclVolume or oclParticle or oclBandwidthTest - all seem ok with both cards.
Tried changing __constant float * attr to __constant float attr[10] and anything i could think of.
Tried commenting out all __constant vars on program scope.
The buffers are CL_MEM_READ_ONLY.
Its float[10]+float - so 2 args of 44byte size - no constant-buffer-limits hit there.
Xorg running or not doesnt matter.
All multiple kernels i run always are on the same program, context and device.

Kernels (though it seems only the arguments matter):

__kernel void quantize(
__read_only image2d_t input,
__constant float * attr, // << i do it like that to keep number of constants small and below limit
__write_only image2d_t filled,
__constant float * precision
)
{
const unsigned int i = (get_global_id(2) * get_global_size(1) * get_global_size(0)) + (get_global_id(1) * get_global_size(0)) + get_global_id(0);
const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_FILTER_NEAREST | CLK_ADDRESS_CLAMP_TO_EDGE;
const unsigned int x = i % (int)(attr[0] - (attr[6]+attr[8]));
const unsigned int y = (i - x) / (attr[0] - (attr[6]+attr[8]));
const int2 pos = (int2)(x+attr[6],y+attr[7]);
float4 v = read_imagef(input,sampler,pos);
if(v.x == attr[4]) return;
v.x = round(v.x / *precision) * *precision;
write_imagef(filled,pos,v);
}

__kernel void test()
{
float nothing = 27*27;
return;
}

I read that there is a difference between accessing global and constant per pointer, but cant see why that should be a prob, as the code runs on the quadro gpu.
Had the thought that maybe using __constant on the quadro would just be __global because the quadro may just emulate it, thusly not crash on its wrong use, because its really using __global.
But how much wrong use can reading a __constant be… As i saw __constant kernel args in the SDK examples i suppose thats legit.

Am not sure anymore if this is all just plain wrong, and the fact that it all works on the Quadro is a mere coincidence, help me.

SYS:
Debian x86_64 GTX470, Quadro FX 1700, 256.25 (also tried 195.36.31 and 195.36.15dev and others, where the old one dont work at all, and 36.31 crashes on wait/finish/flush)

Ah, sounds familiar, the 2nd kernel that uses the same constant arg fails. I have this problem on Win7 GTX480, & OSX M9400 / M9600. I kind of gave up, and commented the constants out & used globals instead. Thought it was just me doing something wrong. These platforms did not actually kill the program, but the data in the constant of the 2nd kernel is crap.

Sorry, I cannot help, but thought you should get the feedback that you are not the only one to get this. I tried to isolate on OSX, and could not get it to fail with very simple kernels. This issue has such an easy work around that I have put off doing anything about.

Thats … impractical, but thanks for your reply! So i will go down the same road and comment out all my __constant kernel arguments, at least where i use multiple kernels within the same context. I am confident that upcoming driver releases will address this issue.

I just downloaded and installed devdriver_3.1_linux_64_258.19_opencl1.1.run from the registered developer page, and it seems as if the issue with the __constant kernel arguments is fixed. Had almost forgotten bout that registered developer page.

Don’t count your chicken until… It seems working, but the results arent correct. Doesnt segfault anymore - but doesnt work either. Access to certain (no all) __constants now just returns 0…

i have the same problem after upgrading to FW 2xx.xx on both linux and windows and on all my cards 8700mgt, 8800gt, gtx285

access to __constant return 0.

if i change the the kernel argument from __constant to __global it works again. previous FW 195.36.31 worked perfect with my app.

maybe i missed something, is it a know issue and is there a fix?

UPDATE: i found out that this problem only occours if i have multiple __kernel functions inside a single .cl file that use __constant memory. as a workaround you can simple use seperate .cl files each containing only s single __kernel function.

UPDATE: i found out that this problem only occours if i have multiple __kernel functions inside a single .cl file that use __constant memory. as a workaround you can simple use seperate .cl files each containing only s single __kernel function.