Hi,

I’m currentliy working on this little piece of code:

```
#pragma acc parallel loop gang private(i,j) copy(force[0:DIM][0:N]) copyin(pos[0:DIM][0:N])
for(i=0;i < num_particles ; ++i){
double tmp_x = 0.0;
double tmp_y = 0.0;
double tmp_z = 0.0;
#pragma acc loop //reduction(+:tmp_x,tmp_z,tmp_z)
for(j=0;j < num_particles; ++j){
double dx = (pos[0][j] - pos[0][i]);
double dy = (pos[1][j] - pos[1][i]);
double dz = (pos[2][j] - pos[2][i]);
double r = dx * dx + dy * dy + dz * dz;
double tmp_f;
if(r != 0.0){
double s = 1.0 / r;
s = s * s * s;
tmp_f = 100.0 * s/r * (1.0 - 2.0 * s);
}else{
tmp_f = 0.0;
}
tmp_x += tmp_f * dx;
tmp_y += tmp_f * dy;
tmp_z += tmp_f * dz;
}
force[0][i] += tmp_x;
force[1][i] += tmp_y;
force[2][i] += tmp_z;
}
}
```

If I compile this code as it is here, it is working perfectly (same results as sequential C code) and I receive the following compiler feedback:

```
472, Accelerator kernel generated
472, CC 2.0 : 47 registers; 16 shared, 96 constant, 0 local memory bytes
476, #pragma acc loop gang /* blockIdx.x */
481, #pragma acc loop vector(256) /* threadIdx.x */
```

However, I think that a reduction for tmp_x, tmp_y and tmp_z would be required. So if I uncomment the reduction I receive the exact same compiler feedback (i.e. nothing about added reductions) but the results are wrong.

What am I missing here?

Thanks.

Best,

Paul