possible compiler bug

Hi All,

I have this code that I have mentioned previously. It is now running parallel via OpenMP. I use the code to model observations of binary stars. It seemed to be working wonderfully on this one data set. I switched data sets, and discovered a problem running in parallel mode. Basically, the parallel model started giving different results with respect to the single-core mode. I started putting write statements here and there, and found that a critical array had NaN values rather than normal numbers.

I then started using compile flags like -Mbound and -Ktrap=inexact. This is when I discovered this odd behavior:


          subroutine getT0(finc,period,ecc,argper,T0,Tconj)
c                                                                               
c   January 30, 2010.                                                           
c                                                                               
c   This routine is the inverse of getcontimes.  Given a time of transit,       
c   it will figure out the T0 needed.                                           
c                                                                               
          implicit double precision (a-h,o-z)
c                                                                               
          pie=3.141592653589793d0
          write(*,*)pie
c          pie=4.0d0*atan(1.0d0) !pie=3.141592653589793d0                       
          write(*,*)finc,pie
          fincr=finc*pie/180.0d0
      	  write(*,*)fincr
          omegar=(argper+180.0d0)*pie/180.0d0
.
.
.

Here is the output when compiled without the -K flag:

3.141592653589793
89.62637196236822 3.141592653589793
1.564275287360457

Here is the output when compiled with that flag:

3.141592653589793
89.62637196236822 3.141592653589793
Floating exception

As you can see, I have tried various combinations, like simply defining pi, or using the arctan function. When I use the arctan function, the exception appears there.

I get the correct results in the parallel code that was compiled with gfortran. I could not get the floating exception, although I might not have used the correct compiler flag(s).

I have an AMD system (“piledriver”) and PGI 14.6. Here are typical compile commands:

pgf90 -mp -O2 -Mextend -tp barcelona -mcmodel=medium -o fred fred.for

(gives wrong results, but runs)


pgf90 -O2 -Mextend -tp barcelona -mcmodel=medium -o fred fred.for

(gives correct results, but single-core)


pgf90 -Ktrap=inexact -mp -O2 -Mextend -tp barcelona -mcmodel=medium -o fred fred.for

(dies)


pgf90 -Ktrap=inexact -O2 -Mextend -tp barcelona -mcmodel=medium -o fred fred.for

(dies)


I also compiled (in single-core mode) on a slightly older piledriver system with PGI version 13.6, and I get the floating point exception. As noted earlier, I cannot use the -mp flag on this older system.

I tried using the debugger, but I could not figure out how to get the debugger to tell me where the NaN values were produced.

What else can I do at this point? Any and all advice is welcome.

Jerry

Just to follow up on my post. The -Ktrap=inexact flag may not be appropriate for this case. I have done some further testing, and here is what is happening.

  • code compiled single-core (no -mp flag): works as expected

  • code compiled with -mp flag: gives wrong result, even with one thread

  • code compiled with -Mbounds and -mp flags: gives correct results, with one or more threads

  • code compiled with -mp flag: gives correct results with a different input data set

So something is wrong somewhere, and I would appreciate advice on how to use the debugger and/or different compiler flags to track this down.

Jerry

Hi Jerry,

One more experiment to try is to comment/define out your OpenMP directives but still compile with “-mp”. “-mp” will push more local variables on the stack (as opposed to allocating them on the heap, such as with automatics) so can perturb behavior. If the wrong answers persist, next try running under Valgrind (http://www.valgrind.org) to see if you have any memory issues such as a UMR (uninitialized memory read).

Also, you can then conduct a binary search to narrow down which file causes the wrong answers by compiling each file with and without -mp until you have it narrowed down to a single file.

If the error goes away without the directives, start adding the OpenMP directive back in and see if you can determine which region the error occurs. You may have private variable that’s not getting initialized.

  • Mat

Hi Mat,

Thanks for the tips. I have discovered that the -Kieee flag along with -mp makes the code run correctly with a single tread and with multiple threads.

The code runs fine in single-core mode and parallel mode with gfortran. So I don’t think it is directly related to issues of private vs. shared variables or uninitialized variables.

When I commented out the !$omp lines an compiled with -mp, the error still is there. I’ll try Valgrind.

Thanks,

Jerry

Hi Jerry,

I have discovered that the -Kieee flag along with -mp makes the code run correctly with a single tread and with multiple threads.

Then it’s most likely a precision issue. Also try instead of using “-Kieee”, use “-Mnofma”.

FMA instructions are considered more accurate since there is less rounding error but can lead to results that don’t agree with non-FMA arithmetic.

It might also explain why you don’t see the issue with gfortran, assuming your version doesn’t generate FMA instructions.

  • Mat

Hi Mat,

If it is a precision issue, then why does it matter whether the program runs normally on a single core (which works), vs. parallel mode (which does not work, even on one thread)?

Also, why would the -Mbounds flag make the code run correctly, with no out-of-bounds errors?

Jerry

If it is a precision issue, then why does it matter whether the program runs normally on a single core (which works), vs. parallel mode (which does not work, even on one thread)?

Different order of operations or different code generation could effect the precision. Also, -Mbounds could just be perturbing an optimization which in turn effects the precision.

How “wrong” are the answers? Just off a few bits? Orders of magnitude?

My suggestion is since you have case where it works compiled one way and fails another, start compiling portions of the code with and without the flag until you can narrow down the exact code that’s causing the difference. There’s also the "!PGI$ OPT " directive which you can use to lower optimization for a particular routine after you get it down to the file level.

  • Mat

Hi Mat,

It is orders of magnitude off. There is an array full of NaNs.

valgrind has pointed me to a routine that seems to be spitting out NaNs from time to time. I am looking into that now.

Thanks,

Jerry

Hi Mat,

I fixed the problem with the bad subroutine, and the code runs as expected in parallel mode. I don’t see how the NaNs would effect the actual model in this case, as that subroutine was giving actual numbers when it was called for the input data. The NaNs came from a spurious extra call. It seems that the serial code was able to ignore the NaNs, but the parallel code could not ignore them in this case.

Thanks again for the help,

Jerry