why these conflicts between threads?

Hello ,

Reading the Chapter 11 ,figure 11.5 of Programming Massively Parallel Processors 2nd ed,

__global__ void cmpFhD(float* rPhi, iPhi, rD, iD,
kx, ky, kz, x, y, z, rMu, iMu, rFhD, iFhD, int N) {

int m = blockIdx.x * FHD_THREADS_PER_BLOCK + threadIdx.x;

rMu[m] = rPhi[m]*rD[m] + iPhi[m]*iD[m];
iMu[m] = rPhi[m]*iD[m] – iPhi[m]*rD[m];

for (n = 0; n < N; n++) {

floatexpFhD = 2*PI*(kx[m]*x[n] + ky[m]*y[n] + kz[m]*z[n]);
floatcArg = cos(expFhD);
floatsArg = sin(expFhD);

rFhD[n] += rMu[m]*cArg – iMu[m]*sArg;
iFhD[n] += iMu[m]*cArg + rMu[m]*sArg;


It says that "The kernel will not execute correctly due to
conflicts between threads in writing into rFhD and iFhD arrays.

I can’t understand why the conflicts.

For example ,

rMu[0] = 2
iMu[0] = 3

rMu[1] = ..
iMu[1] = ..

rMu[2] = ..
iMu[2] = ..
rFhD[0] += 2 * cArg - 3 * sArg
iFhD[0] += 3 * cArg + 2 * sArg

I can understand that all the values of rMu and iMu are generated the same time.
But in calculating the rFhD and iFhD which is inside a loop , shouldn’t each value ( but ok ,we don’t know if it will be rMu[0] or rMu[2] or …) of rMu and iMu be written to each value of rFhD and iFhD?

Using this code ,it says that we don’t have conflicts anymore:


_global__ void cmpFHd(float* rPhi, iPhi, phiMag,
kx, ky, kz, x, y, z, rMu, iMu, int M) {

int n = blockIdx.x * FHD_THREADS_PER_BLOCK + threadIdx.x;

for (m = 0; m < M; m++) {

float expFhD = 2*PI*(kx[m]*x[n]+ky[m]*y[n]+kz[m]*z[n]);
float cArg = cos(expFhD);
float sArg = sin(expFhD);

rFhD[n] += rMu[m]*cArg – iMu[m]*sArg;
iFhD[n] += iMu[m]*cArg + rMu[m]*sArg;


Now ,it's thread carries its own value.Ok with that.
But the first code confuses me.

Note the “+=” after ‘rFhD[n]’ and ‘iFhD[n]’, this reads from and writes to both arrays.

In the first example the value for ‘n’ in ‘rFhD[n]’ and ‘iFhD[n]’ is equal for all threads. Therefore all threads read the old value of ‘rFhD[n]’ and ‘iFhD[n]’ at the same time, add some value, and then store it again. But the value stored is only what one thread calculated, all other calculations by other threads are overwritten by this thread.

In the second example ‘n’ is different for each thread, and each thread reads at a different index of ‘rFhD[n]’ and ‘iFhD[n]’, therefore there are no conflicts between threads.

So, for example:

fFhD[0] = rFhD[0] + rMu[m]…,where m can be any thread at the same time? And it is only one.That’s why?

I am confused because rFhD is in aloop and it will hold all n values one by one.


So , it is because all threads try to read and write to rFhD , right?Because all threads are created at the same time and we can’t tell which thread writes/reads to rFhD.