I need to perform this code in opencl.

for(i = 0; i < M; i++)

{

for( j = 0; j < N; j++)

{

tau[ f[k][i] ][ f[k][i+1] ] += Q/L[k];

}

}

Where f is a Mx(N+1) matrix and tau is a NxN matrix.

When I set the range to M x N and try to perform the computation, i need to do sincronization or an operation can override the other. I tried to use the atom_add but Nvidia implementation just allowed for int and long. I need float.

What I have to do?