hi
I have a problem with the following kernel may be a racing condition. I tried to use __threadfence() but nothing happens and results remain faulty
Here is the kernel:
Inputs:
ADiagnol:
1.016585 0.683285 3.045785 0.320685
ASubDiagnol:
4.242600 1.779500 0.988300
Results:
Q:
1.653077 3.045785 0.320685
R:
1.730515 0.988300
Z:
4.362694 1.779500 0.988300 0.000000
X:
1.016585 -3.966595 0.00000 0.00000
Y:
4.242600 0.414655 0.000000
Correct values [computed by CPU]:
Q:
1.653077 0.868369 0.956883
R:
1.730515 0.404530
Z:
4.362694 4.347469 3.109890 -0.01750