Different results in Debug and Release mode compile

robosmith · October 16, 2015, 7:51pm

I’ve never seen this before.
I’m getting slightly different results when compiling in debug mode.
The results are exactly the same from run to run so I don’t think it’s a race condition.
What else could cause accuracy differences between Debug and release?

njuffa · October 16, 2015, 8:00pm

If your code uses floating-point computation, the most likely reason is that FADD dependent upon FMUL will frequently get contracted to FMA (fused multiply-add) in a release build. To confirm that this explains the differences, you can turn off this contraction by building with -fma=false. Since this typically has a negative impact on both accuracy and performance you wouldn’t want to use that for your production build, but it is useful for experiments.

Robert_Crovella · October 16, 2015, 8:06pm

I understand your claim about no race conditions, but I would also suggest running such a code using cuda-memcheck including separate runs with each of the sub-tool options (initcheck, synccheck, racecheck, etc.)

[url]http://docs.nvidia.com/cuda/cuda-memcheck/index.html#abstract[/url]

robosmith · October 16, 2015, 9:21pm

If I put -fma=false in the additional options, I get: nvcc fatal : Unknown option ‘fma’

If I put it in Additional Compiler Options, I get:
1>cl : Command line warning D9002: ignoring unknown option ‘-fma=false’

Where should I put this in VS propoerties?

njuffa · October 16, 2015, 9:26pm

Sorry, I mistyped: -fmad=false. Hint: nvcc --help will list the command line arguments it accepts.

robosmith · October 16, 2015, 9:28pm

Is it by any chance supposed to be --fmad=false ?
ETA: just saw your correction

robosmith · October 19, 2015, 3:43pm

SOLVED: The difference was due to fused multiply add.

Disabled it in release compile and the results are identical to debug.

njuffa · October 19, 2015, 3:59pm

Keep in mind that the use of the fused multiply-add (FMA) usually has a significant positive impact on performance, and a noticeable positive impact on accuracy, so you would ultimately want to allow the compiler to use the contraction for release builds.

Since FMA-contraction is an optimization, and debug builds are un-optimized, these kind of numerical discrepancies are common (and occur in like fashion on CPUs that support FMA). One way around this that preserves full performance is to code the FMAs directly, by using the standard C/C++ math library functions fma() and fmaf(). Depending on the nature of your code, that may be trivial to do, or a pain in the neck.

Personally, I have gotten into the habit of using fma() directly in the source code: Often there are multiple ways to re-arrange a computation for the use of FMA, only one of which is “optimal” in terms of accuracy. The compiler does not understand numerical analysis, it just walks the DAG constructed from the source expression and contracts FADD dependent on FMUL.

allanmac · October 19, 2015, 9:56pm

@njuffa, they’re trying to mechanize your knowledge! :)

http://herbie.uwplse.org/

njuffa · October 20, 2015, 12:54am

Interesting, hadn’t heard of that. I am going to take a look. I am not a numerical analysis guy either (my degree is in CS, not math), but I have a reasonable understanding of many numerical issues. But just the other week I had to investigate the most accurate sequence to use for the most significant terms of a polynomial, in brute force fashion. While this seemed like a simple situation, the best variant I found (out of five or so arrangements I tried), was not at all what I would have expected.

So any tool that can reason intelligently about numerics, in particular in the presence of FMA, has the potential of saving quite some time. Overall there seems to be too little written in the literature about all the improvements that use of FMA enables.

Topic		Replies	Views
Weird result difference between release and debug even with -fmad=false CUDA Programming and Performance cuda	8	791	June 30, 2022
Release and Debug modes on CUDA 5.0 CUDA Programming and Performance	2	1059	January 9, 2014
Floating point operations IEE compliance and debug mode CUDA Programming and Performance	3	1057	April 4, 2013
Same code, same input, different results CUDA Programming and Performance cuda	5	474	September 6, 2023
fma() CUDA Programming and Performance	2	9652	April 20, 2014
FMA precision issue CUDA Programming and Performance	9	19547	November 21, 2010
Debugger error. CUDA Programming and Performance	3	624	December 29, 2016
GPU Code and CPU Code output not matching till machine precision (i.e. 13 decimals places) CUDA Programming and Performance	22	1109	August 9, 2023
code complied with -g -G is different from that compiled with -O CUDA Programming and Performance	10	1515	March 8, 2014
long live the compiler CUDA Programming and Performance	8	1082	April 12, 2015

Different results in Debug and Release mode compile

Related topics