Trouble with first OpenCL program

Hello and happy new year to everyone,

I recently started to use OpenCL to develop software. As first program I took a sample source code from a book and altered it. It compiles without problems and warnings but I still do not understand the result.

The program’s kernel should do the following: It gets an array with, for example, 128 elements and should compute the mean and standard deviation for every 64 elements and store the standard deviation in the output array in a certain position. So when computing an array with 128 elements, there should be two standard deviations in the output array.

Unfortunately, when I compile and execute the program there are four values in my output array and I do not understand why.

The globalWorkSize = 128 and the localWorkSize = 64, so the complete array with 128 elements is devided into two workgroups with 64 work items each, right?

Here is the kernel I use:

__kernel void hello_kernel(__global const float *src,

                     __global float *temp,

                     __global float *sigma)


   int gid = get_global_id(0);

   int size = 64, i = 0, iweight = 31;   

   float mean[1] = {0.0}, stdDev[1] = {0.0};

   float sum = 0.0, sumPow = 0.0;

   float numerator = 0.0, denominator = 0.0;

/*Compute array start position*/

   const uint start = gid * 64;

/*Mean and standard deviation*/

temp[gid] = src[gid];

for( int i = 0; i < size; i++)

      {sum = sum + temp[start + i];

      sumPow = sumPow + temp[start + i] * temp[start + i];}

numerator = (size*sumPow) - (pow(sum, 2.0));

   denominator = (64 * (64-1));

mean[0] = sum/64;

   i = (int)(round(iweight * mean[0]));

stdDev[0] = sqrt(numerator / denominator);   

if (stdDev[0] < sigma[i]) sigma[i] = stdDev[0];


My system:

Win 7 32 bit Prof.

GeForce 9600 GT 512MB RAM

Display Driver Version: 280.26

Visual Studio 2010 Prof.

I hope that someone can help me with my problem and thank you very much!!