Hello all together,

I’m trying to solve the following problem:

The following explanation is a bit longish. The short version is:

I have got some write-after-write errors due to parallization. You can read the long version or directly look at the code:

We have got an array of arrays (matrix) whose euklidean distances I want to calculate to each other:

Each row of the matrix descriptively accords to a 64dimensional vector.

If I want to calculate the squared distance of the first-row-vector to the second-row-vector for example, I have to sum up all 64 squared differences. This sum should me accumulated into a certain field of another array.

Theres the problem! Caused by prallelity there write-after-write’s so that there’s no sum in thee target field but only 0 (the starting value) + any ONE squared difference (mostly the 64th difference squared).

Here is my kernel:

```
__global__ void square_array(int *a, int *b, int candidate, int N, int linescount)
{
//we assume that candidate==0
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if ((64<idx)&&(idx<67)) {
int diff = (a[idx] - a[idx%64]);
//Here comes the problematic line: the value of b[...] is not beeing updated so that another thread uses this new value.
b[(idx-idx%64)/64] = b[(idx-idx%64)/64] + diff*diff;
}
//b[(idx-idx%64)/64] = b[(idx-idx%64)/64] + diff*diff;
}
```

How could I solve this problem? I already tried around with __syncthreads() but had no success.