Hi

I’ve tried to create a kernel for replace this c++ code:

```
for(size_t i = 0; i < i_bus; i++) {
for(size_t k = 0; k < i_bus; k++) {
P(i) = P(i) + V(i)* V(k)*(G(i,k)*cos(del(i)-del(k)) + B(i,k)*sin(del(i)-del(k)));
Q(i) = Q(i) + V(i)* V(k)*(G(i,k)*sin(del(i)-del(k)) - B(i,k)*cos(del(i)-del(k)));
}
}
```

My kernel is the following:

```
__global__ void computePQ(double* del, double* G, double* B, double* Q, double* P, double* V, int i_bus){
int tid= blockDim.x*blockIdx.x+threadIdx.x;
if(tid<i_bus){
for(int i =0; i<i_bus;i++){
P[tid]+=V[tid]*V[i]*(G[tid*i_bus+i]*cos( del[tid]-del[i])+B[tid*i_bus+i]*sin( del[tid]-del[i]));
Q[tid]+=V[tid]*V[i]*(G[tid*i_bus+i]*sin( del[tid]-del[i])-B[tid*i_bus+i]*cos( del[tid]-del[i]));
}
}
}
```

I’ve checked the data of all vector and matrix pass in parameter, they are the same. However i get two differents results of data: This one for the CPU:

```
-7.83789e+006
8.15785e+006
319957
319957
319957
-2.98023e-008
-8.9407e-008
0
319957
319957
0
0
0
319957
0
0
```

And this one from CUDA:

```
8.15785e+006
-7.83789e+006
319957
319957
319957
0
0
0
319957
319957
0
0
0
319957
0
0
```

I don’t understand why i have similar result but not exactly the same. From my point of view, the kernel is kind of simple.

Thanks for your help.