Strange behvior on a simple test case Result of a very simple arithmetic process is wrong

There is an error inside one of my openCL development. After having spent some time, i’ve found the simplest CL code that fails:

kernel void Algo( __global   int* outp )


	int PDif;

	int PShift;

	int PAbs;

	__local int P0;

	if(get_local_id(0)==0) {

		P0 = 2;




	if(get_local_id(0)==0) {

		PShift  = P0 << 1;       // Should be 4

		PDif    = 1 - P0;        // Should be -1 

		PAbs    = abs(PDif);     // Should be 1

		outp[0] = PShift + PAbs; // Should be 5



The result should always be 5 since the output buffer value only depends on a constant value (P0) stored in a local variable.

Unfortunately, the result is … 3 ! As if the PAbs result was negative !

There are various ways to get a correct behavior such as using the -cl-optim-disable option, adding a local barrier after the computation of the PShift variable or declaring the P0 variable into the private memory.

The platform is:

device 0x25f1490 has              Quadro 2000 CL_DEVICE_NAME

device 0x25f1490 has       NVIDIA Corporation CL_DEVICE_VENDOR

device 0x25f1490 has                   295.49 CL_DRIVER_VERSION

device 0x25f1490 has          OpenCL 1.1 CUDA CL_DEVICE_VERSION

device 0x25f1490 has             FULL_PROFILE CL_DEVICE_PROFILE

device 0x25f1490 has                        4 CL_DEVICE_MAX_COMPUTE_UNITS

device 0x25f1490 has                     1024 CL_DEVICE_MAX_WORK_GROUP_SIZE

device 0x25f1490 has                    65536 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE

device 0x25f1490 has               1072889856 CL_DEVICE_GLOBAL_MEM_SIZE

device 0x25f1490 has                    49152 CL_DEVICE_LOCAL_MEM_SIZE

device 0x25f1490 has                        2 CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV

device 0x25f1490 has                        1 CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV

device 0x25f1490 has                    32768 CL_DEVICE_REGISTERS_PER_BLOCK_NV

[For an unknown reason, i cannot attach the host C code. If needed, i will post it]

So what is wrong with this code ?

Thanks for any help.

make sure you are managing your buffers right.

Thanks for your reply.

I think there is no problem with the way the buffers are managed.
There are several algorithms (each of several hundred CL lines) that are working on images up to 8 MPixl.
There is only one that fails. From this latter, i’ve extracted the piece of code generating the error to make this test case.
I can send you the Host C Code if you want to check it.

I’ve always found it instructive to output intermediate results when things go bad. Try something like this (and you may need to move the outp = statements):


	if(get_local_id(0)==0) {

		PShift  = P0 << 1;       // Should be 4

		PDif    = 1 - P0;        // Should be -1 

		PAbs    = abs(PDif);     // Should be 1

		outp[0] = PShift + PAbs; // Should be 5

                outp[1] = PShift;

                outp[2] = PAbs;

                outp[3] = P0;

                outp[4] = abs ( -1 );

                // etc...


Yes, you are right. It is always instructive and this is the reason why i did it.
But, unfortunetaly, tracing intermediate buffers leads to a correct behiavior … Final and intermediate values are right in case of tracing …

How can i determine if it is a compiler or an hardware error without changing the graphic board ?