Odd error maybe due to numerical resolution?

khea_actua1 · January 27, 2011, 6:33pm

This is an odd behaviour to describe, and I doubt I’ll be able to accurately describe it in a way that could pin point down the bug.

In short, have a [Fortran90] function (aeroprop), which I copied to another file where both the original signal (X) and differentiated (dX) signal are outputted.

I typically test all my code in the GNU compiler before the PGI compiler, mostly because the GNU compiler offers more compiling warnings, and I assume working in both compilers is a good test… That leads me to the problem I struggled for three days on (and am not convinced I solved.)

To test this code, I was matching the output X. When compiled with gfortran, the outputs were identical (at least down to a tolerance of 1e-10), but they were different by up to 1e-2 when compiled with PGI.

So, code from file A (aeroprop) and file B were outputting different signals (in signal X, not dX), the code for anything that could affect X was identical across both files.

More over, I found that if I commented out a variable that’s never read (for signal X), then the concentrations in X would be identical in the PGI compiler.

I don’t know what to make of this problem. Strangly enough, changing this line:

TRAMASS= max(1.0e-33, rgrid(i,l,no))

to use 1.0e-7 rather than 1.0e-33 fixed it (but -8 would have the problem.) I bet this is only fixing it by coincidence.

In any case, the code files are too long to post here, even the ones I truncated down to focus in on the offending lines, and I realize my description is too vague to solicit a fix, but I am interested in suggestions.

Has anyone encountered something like this? Or have suggestions on how to proceed?

MatColgrove · January 29, 2011, 12:57am

Hi khea_actua1,

There are few possibilities. There is a compiler bug (either PGI or Gfortran), a bug in your program, an optimization issue, or some other difference in how the two compilers compile your code.

The first thing to do is compile your program without optimization and debugging enabled (-g) with both compilers. If the problem persist, then it’s more likely a problem with your code or how it’s being built. Try compiling again with the diagnostic flags “-Mbounds -Mchkstk -Mchkptr”. Also, try using valgrind (www.valgrind.org) to check for un-initialized memory usage (UMR). A UMR would cause the problem you describe.

If the debug versions result in the same X, next try incrementally adding your optimization flags until the differing answers begin.

Let me know what you discover,
Mat

khea_actua1 · February 1, 2011, 10:51pm

Thanks for the reply!

I started by removing the optimization flags as you suggested, and right away it worked. The flag that was causing the problem was the -fast flag.

This flag seems to be an alias for many other options, so it doesn’t narrow down the problem that well, but it’s good to know it.

I thought I thoroughly checked for UMR, so hopefully it isn’t that, I won’t count that out though.

Adding the flags you suggested however fixed the problem and still allows me to use optimization. Reading about what they do now.

Thanks again.

MatColgrove · February 1, 2011, 11:32pm

Hi khea_actua1,

“-fast” is an aggregate flag containing a set of common optimizations for high performance. It does include vectorization and faster intrinsics, both of which can reduce accuracy. It’s usually no more than 1ulp, but for programs that are highly sensitive it can cause problems. For example, if your program converges, then slight changes in accuracy can cause differing results.

To check, try the following flag sets:

-O2
-O2 -Mvect
-fast -Kieee
-fast -Mnovect

-fast does change slightly from platform to platform but usually includes the following flags. You can try each in turn to see how it effects both your results but also your performance. Note that several of these optimization require -O2 to be enabled.

-O2
-Munroll=c:1
-Mlre
-Mautoinline
-Mvect=sse
-Mpre

Hope this helps,
Mat