Hello and happy new year to everyone,
I recently started to use OpenCL to develop software. As first program I took a sample source code from a book and altered it. It compiles without problems and warnings but I still do not understand the result.
The program’s kernel should do the following: It gets an array with, for example, 128 elements and should compute the mean and standard deviation for every 64 elements and store the standard deviation in the output array in a certain position. So when computing an array with 128 elements, there should be two standard deviations in the output array.
Unfortunately, when I compile and execute the program there are four values in my output array and I do not understand why.
The globalWorkSize = 128 and the localWorkSize = 64, so the complete array with 128 elements is devided into two workgroups with 64 work items each, right?
Here is the kernel I use:
__kernel void hello_kernel(__global const float *src,
__global float *temp,
__global float *sigma)
{
int gid = get_global_id(0);
int size = 64, i = 0, iweight = 31;
float mean[1] = {0.0}, stdDev[1] = {0.0};
float sum = 0.0, sumPow = 0.0;
float numerator = 0.0, denominator = 0.0;
/*Compute array start position*/
const uint start = gid * 64;
/*Mean and standard deviation*/
temp[gid] = src[gid];
for( int i = 0; i < size; i++)
{sum = sum + temp[start + i];
sumPow = sumPow + temp[start + i] * temp[start + i];}
numerator = (size*sumPow) - (pow(sum, 2.0));
denominator = (64 * (64-1));
mean[0] = sum/64;
i = (int)(round(iweight * mean[0]));
stdDev[0] = sqrt(numerator / denominator);
if (stdDev[0] < sigma[i]) sigma[i] = stdDev[0];
}
My system:
Win 7 32 bit Prof.
GeForce 9600 GT 512MB RAM
Display Driver Version: 280.26
Visual Studio 2010 Prof.
I hope that someone can help me with my problem and thank you very much!!
Wolfheart