I need to perform this code in opencl.
for(i = 0; i < M; i++)
for( j = 0; j < N; j++)
tau[ f[k][i] ][ f[k][i+1] ] += Q/L[k];
Where f is a Mx(N+1) matrix and tau is a NxN matrix.
When I set the range to M x N and try to perform the computation, i need to do sincronization or an operation can override the other. I tried to use the atom_add but Nvidia implementation just allowed for int and long. I need float.
What I have to do?