Hello ,

Reading the Chapter 11 ,figure 11.5 of Programming Massively Parallel Processors 2nd ed,

```
__global__ void cmpFhD(float* rPhi, iPhi, rD, iD,
kx, ky, kz, x, y, z, rMu, iMu, rFhD, iFhD, int N) {
int m = blockIdx.x * FHD_THREADS_PER_BLOCK + threadIdx.x;
rMu[m] = rPhi[m]*rD[m] + iPhi[m]*iD[m];
iMu[m] = rPhi[m]*iD[m] – iPhi[m]*rD[m];
for (n = 0; n < N; n++) {
floatexpFhD = 2*PI*(kx[m]*x[n] + ky[m]*y[n] + kz[m]*z[n]);
floatcArg = cos(expFhD);
floatsArg = sin(expFhD);
rFhD[n] += rMu[m]*cArg – iMu[m]*sArg;
iFhD[n] += iMu[m]*cArg + rMu[m]*sArg;
}
}
```

It says that "The kernel will not execute correctly due to

conflicts between threads in writing into rFhD and iFhD arrays.

".

I can’t understand why the conflicts.

For example ,

```
rMu[0] = 2
iMu[0] = 3
rMu[1] = ..
iMu[1] = ..
rMu[2] = ..
iMu[2] = ..
.....
rFhD[0] += 2 * cArg - 3 * sArg
iFhD[0] += 3 * cArg + 2 * sArg
```

I can understand that all the values of rMu and iMu are generated the same time.

But in calculating the rFhD and iFhD which is inside a loop , shouldn’t each value ( but ok ,we don’t know if it will be rMu[0] or rMu[2] or …) of rMu and iMu be written to each value of rFhD and iFhD?

Using this code ,it says that we don’t have conflicts anymore:

_

```
_global__ void cmpFHd(float* rPhi, iPhi, phiMag,
kx, ky, kz, x, y, z, rMu, iMu, int M) {
int n = blockIdx.x * FHD_THREADS_PER_BLOCK + threadIdx.x;
for (m = 0; m < M; m++) {
float expFhD = 2*PI*(kx[m]*x[n]+ky[m]*y[n]+kz[m]*z[n]);
float cArg = cos(expFhD);
float sArg = sin(expFhD);
rFhD[n] += rMu[m]*cArg – iMu[m]*sArg;
iFhD[n] += iMu[m]*cArg + rMu[m]*sArg;
}
}
Now ,it's thread carries its own value.Ok with that.
But the first code confuses me.
```