I found one problem in my kernel function regarding calculation precision.
In a division calculation: 1023000 / 5000000, the result should be just 0.2046, but my kernel function get 0.2046000063419342041015625… Why is there a residual after 0.20460000?
This difference directly results in other wrong results in my following codes.
Now I see that the difference is not because of the calculation, even if I directly assign value 0.2046 to the result array. Then I copy the values from GPU to CPU, and print out shows 0.2046000063419342041015625
So the extra bits past …342 imply to me that you are converting, casting or doing something even more subtle since dividing the doubles “1023000.0 / 5000000.0” would’ve had a different result.