Hello and happy new year to everyone,

I recently started to use OpenCL to develop software. As first program I took a sample source code from a book and altered it. It compiles without problems and warnings but I still do not understand the result.

The program’s kernel should do the following: It gets an array with, for example, 128 elements and should compute the mean and standard deviation for every 64 elements and store the standard deviation in the output array in a certain position. So when computing an array with 128 elements, there should be two standard deviations in the output array.

Unfortunately, when I compile and execute the program there are four values in my output array and I do not understand why.

The globalWorkSize = 128 and the localWorkSize = 64, so the complete array with 128 elements is devided into two workgroups with 64 work items each, right?

Here is the kernel I use:

```
__kernel void hello_kernel(__global const float *src,
__global float *temp,
__global float *sigma)
{
int gid = get_global_id(0);
int size = 64, i = 0, iweight = 31;
float mean[1] = {0.0}, stdDev[1] = {0.0};
float sum = 0.0, sumPow = 0.0;
float numerator = 0.0, denominator = 0.0;
/*Compute array start position*/
const uint start = gid * 64;
/*Mean and standard deviation*/
temp[gid] = src[gid];
for( int i = 0; i < size; i++)
{sum = sum + temp[start + i];
sumPow = sumPow + temp[start + i] * temp[start + i];}
numerator = (size*sumPow) - (pow(sum, 2.0));
denominator = (64 * (64-1));
mean[0] = sum/64;
i = (int)(round(iweight * mean[0]));
stdDev[0] = sqrt(numerator / denominator);
if (stdDev[0] < sigma[i]) sigma[i] = stdDev[0];
}
```

My system:

Win 7 32 bit Prof.

GeForce 9600 GT 512MB RAM

Display Driver Version: 280.26

Visual Studio 2010 Prof.

I hope that someone can help me with my problem and thank you very much!!

Wolfheart