please see this kernel and correspond output, when I execute this kernel it will work for the first thread and the other will give a wrong output, so any one can help me to figure out what is the problem?

__kernel void rmsCalculation(const __global float* a ,

const __global float * C,

__global float * O,

const int col)

{

const int ar = get_global_id(0);

```
float R=0;
float I=0;
float c=0;
bool totalSch = true;
float sum=0;
for(int j=0;j<col; ++j)
{
c = C[j] * a[ar * col + j];
I=0;
do
{
R = I + c;
I=0;
if(R>T[j])
{
totalSch = false;
break;
}
else
{
for(int k=0 ; k<j ; ++k)
{
I = I + C[k] * a[ar * col + k];
}
}
}while(I+c > R);
sum = sum + R;
if(totalSch == false)
{
break;
}
}//end for(j=0..
O[ar]=sum;
```

}

but in the output the first element in the “O” array is calculated correctly but the other elements are wrong as shown bleow;

0= 11

1= -9.99199e+18

2= -9.99199e+18

3= -9.99199e+18

4= -9.99199e+18

5= -9.99199e+18

6= -9.99199e+18

7= -9.99199e+18

8= -9.99199e+18

9= -9.99199e+18

10= -9.99199e+18

11= -9.99199e+18

12= -9.99199e+18

13= -9.99199e+18

14= -9.99199e+18

15= -9.99199e+18

16= -9.99199e+18

17= -9.99199e+18

18= -9.99199e+18

19= -9.99199e+18

…

So what is the problem in the code?