Could someone help me with my kernel, I want to do this function:

void Removemean(float* rfData)

{

//nmpts is my number of columns

//nmlne is my number of rows

float avg = 0.0;

for(int i=0,i<nmlne;i++)

{

avg=0.0;

for(int j=0;j<nmpts;j++) avg += rfData[ i*nmpts + j];
avg /= nmpts;
for(int j=0;j<nmpts;j++) rfData[ i*nmpts + j] -= avg;

}

I have a kernel that is working but not faster enough (not colaesced).

How can I do that?

Many thanks