So i have a few kernels, all worked well on my old Quadro FX 1700.
I installed an additional GTX 470 and tried to run the very same kernels on that.
Those tools that run a single kernel multiple times work, and are FAST.
Any tool that runs a kernel, that has a __constant argument (it’s __constant float * attr, a cl_float[10]) will segfault when running another kernel after that.
The first one works, gives output. The second one can even be a dummy-kernel, doesnt matter: segfault.
If i change the __constant attributes of the first kernel to __global, it works.
If i use the Quadro, it works.
Tried some examples from the SDK, like oclVolume or oclParticle or oclBandwidthTest - all seem ok with both cards.
Tried changing __constant float * attr to __constant float attr[10] and anything i could think of.
Tried commenting out all __constant vars on program scope.
The buffers are CL_MEM_READ_ONLY.
Its float[10]+float - so 2 args of 44byte size - no constant-buffer-limits hit there.
Xorg running or not doesnt matter.
All multiple kernels i run always are on the same program, context and device.
Kernels (though it seems only the arguments matter):
__kernel void quantize(
__read_only image2d_t input,
__constant float * attr, // << i do it like that to keep number of constants small and below limit
__write_only image2d_t filled,
__constant float * precision
)
{
const unsigned int i = (get_global_id(2) * get_global_size(1) * get_global_size(0)) + (get_global_id(1) * get_global_size(0)) + get_global_id(0);
const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_FILTER_NEAREST | CLK_ADDRESS_CLAMP_TO_EDGE;
const unsigned int x = i % (int)(attr[0] - (attr[6]+attr[8]));
const unsigned int y = (i - x) / (attr[0] - (attr[6]+attr[8]));
const int2 pos = (int2)(x+attr[6],y+attr[7]);
float4 v = read_imagef(input,sampler,pos);
if(v.x == attr[4]) return;
v.x = round(v.x / *precision) * *precision;
write_imagef(filled,pos,v);
}
__kernel void test()
{
float nothing = 27*27;
return;
}
I read that there is a difference between accessing global and constant per pointer, but cant see why that should be a prob, as the code runs on the quadro gpu.
Had the thought that maybe using __constant on the quadro would just be __global because the quadro may just emulate it, thusly not crash on its wrong use, because its really using __global.
But how much wrong use can reading a __constant be… As i saw __constant kernel args in the SDK examples i suppose thats legit.
Am not sure anymore if this is all just plain wrong, and the fact that it all works on the Quadro is a mere coincidence, help me.
SYS:
Debian x86_64 GTX470, Quadro FX 1700, 256.25 (also tried 195.36.31 and 195.36.15dev and others, where the old one dont work at all, and 36.31 crashes on wait/finish/flush)