If your code uses floating-point computation, the most likely reason is that FADD dependent upon FMUL will frequently get contracted to FMA (fused multiply-add) in a release build. To confirm that this explains the differences, you can turn off this contraction by building with -fma=false. Since this typically has a negative impact on both accuracy and performance you wouldn’t want to use that for your production build, but it is useful for experiments.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Floating point operations IEE compliance and debug mode | 3 | 1013 | April 4, 2013 | |
Release and Debug modes on CUDA 5.0 | 2 | 1009 | January 9, 2014 | |
FMA precision issue | 9 | 19452 | November 21, 2010 | |
fma() | 2 | 9253 | April 20, 2014 | |
code complied with -g -G is different from that compiled with -O | 10 | 1426 | March 8, 2014 | |
Debugger error. | 3 | 577 | December 29, 2016 | |
long live the compiler | 8 | 1015 | April 12, 2015 | |
CPU and CUDA code yield different results? | 3 | 1128 | June 28, 2013 | |
Use_fast_math Difference | 1 | 3435 | August 8, 2007 | |
Weird result difference between release and debug even with -fmad=false | 8 | 685 | June 30, 2022 |