I am on linux too.When i add the option “-famd=false”,the error disappers.
But i do not know there are some others rules that may affect my result.I appreciate you can read my last topic,that problem is an example.
Yes, with that code change I can reproduce your observation.
The -fmad=false observation pretty much confirms the discrepancy in this case is due to FMA contraction of operations.
Usually, FMA contraction results in the same or better accuracy, not reduced accuracy, so my guess is that the CPU calculation is actually less accurate, and when you turn off FMA contraction/usage, the GPU result also becomes identically less accurate.
While the CUDA toolchain treats floating-point computation conservatively in general (no frequent re-associations that occur by default in various host compilers), it does have FMA contraction turned on by default because the FMA (fused multiply-add) is the central building block of the computational units of the GPU.
As already discussed, FMA contraction can be turned off by adding -fmad=false to the nvcc command line, but this will in general have a negative impact on performance and accuracy. This NVIDIA blog post has a link to an NVIDIA white paper on floating-point topics, including the use of FMA:
Note that modern CPUs also include an FMA instruction, so if your host compiler supports FMA contraction, you might want to try turning that on. Generally speaking, one should not expect bit-wise matching floating-point results between any two platforms: results can differ for any number of reasons, including different compiler or library versions.