Hello guys!
I’m trying to calculate the sum of a simple vector.
I think the most difficult concept to understand about OpenCL is the shared memory.
So, All threads on the same block share the “__local” memory, right?
To test this, I create a kernel the sums a vector using only one group.
And did the following kernel:
__kernel void VectorAdd(__global const float* vector, __global float* result, __local float *partResult)
{
int id = get_global_id(0);
if(id == 0)
*partResult = 0;
barrier(CLK_LOCAL_MEM_FENCE);
*partResult += vector[id];
barrier(CLK_LOCAL_MEM_FENCE);
if(id == get_global_size(0)-1)
*result = *partResult;
}
I initialized the __local with 0, synchronize… Do the sum, synchronize and copy to the global memory.
What’s is wrong on this code?
Remembering that there’s only one group.
Thanks!