There is an error inside one of my openCL development. After having spent some time, i’ve found the simplest CL code that fails:
kernel void Algo( __global int* outp )
{
int PDif;
int PShift;
int PAbs;
__local int P0;
if(get_local_id(0)==0) {
P0 = 2;
}
barrier(CLK_LOCAL_MEM_FENCE);
if(get_local_id(0)==0) {
PShift = P0 << 1; // Should be 4
PDif = 1 - P0; // Should be -1
PAbs = abs(PDif); // Should be 1
outp[0] = PShift + PAbs; // Should be 5
}
}
The result should always be 5 since the output buffer value only depends on a constant value (P0) stored in a local variable.
Unfortunately, the result is … 3 ! As if the PAbs result was negative !
There are various ways to get a correct behavior such as using the -cl-optim-disable option, adding a local barrier after the computation of the PShift variable or declaring the P0 variable into the private memory.
The platform is:
device 0x25f1490 has Quadro 2000 CL_DEVICE_NAME
device 0x25f1490 has NVIDIA Corporation CL_DEVICE_VENDOR
device 0x25f1490 has 295.49 CL_DRIVER_VERSION
device 0x25f1490 has OpenCL 1.1 CUDA CL_DEVICE_VERSION
device 0x25f1490 has FULL_PROFILE CL_DEVICE_PROFILE
device 0x25f1490 has 4 CL_DEVICE_MAX_COMPUTE_UNITS
device 0x25f1490 has 1024 CL_DEVICE_MAX_WORK_GROUP_SIZE
device 0x25f1490 has 65536 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE
device 0x25f1490 has 1072889856 CL_DEVICE_GLOBAL_MEM_SIZE
device 0x25f1490 has 49152 CL_DEVICE_LOCAL_MEM_SIZE
device 0x25f1490 has 2 CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV
device 0x25f1490 has 1 CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV
device 0x25f1490 has 32768 CL_DEVICE_REGISTERS_PER_BLOCK_NV
[For an unknown reason, i cannot attach the host C code. If needed, i will post it]
So what is wrong with this code ?
Thanks for any help.