possible bug with opepMP in FORTRAN

Hi All,

I have this very long FORTRAN 77 code (close to 100,000 lines) that I have recently made parallel using openMP. It reads a configuration file with several real and integer parameters, reads in data files, and runs very long simulations.

I have been using PGI version 14.9 on an Intel Xeon E5-2680 (v2) system with 40 threads without any problems. I recently expanded the input file to have 10 more real*8 parameters and 10 more integer parameters. Their default values are all zeros. This code worked in serial and parallel mode. The other day I set the first new parameter (call it sw80) to a non-zero value. The code would hang when running in parallel, but would work in serial mode.

The code runs in parallel when compiled with gfortran. I did a lot of different experiments with different compilations, and here is a summary:


command: pgf90 -Mextend -mcmodel=medium -O2 -o fred_pgi fred.for
result: runs when sw80 is both zero and non-zero

command: pgf90 -Mextend -mcmodel=medium -O2 -mp -o fred_pgi_par fred.for
result: runs when sw80=0.0, hangs otherwise

command: pgf90 -Mextend -mcmodel=medium -O2 -mp -Mbounds -o fred_pgi_par_bounds fred.for
result: runs in parallel (but slowly) when sw80 is zero and nonzero

command: pgf90 -Mextend -mcmodel=medium -O1 -mp -o fred_pgi_par_O1 fred.for:
result: runs in parallel for all values of sw80

command: gfortran -ffixed-line-length-132 -O2 -mcmodel -o fred_gfort fred.for
result: runs for all values of sw80, gives same results as fred_pgi

command: gfortran -ffixed-line-length-132 -O2 -mcmodel=medium -fopenmp -o fred_gfort_par fred.for
result: runs in parallel for all values of sw80.

The code seems to run fine in serial mode, whether it is compiled using PGI or gfortran. I can use -O3 for PGI and -O2 for gfortran and get similar results (gfortran messes up when I try to use -O3). I used the bounds-check option and did not find any instance of an out-of-bounds array index. It also runs in parallel mode when compiled with gfortran. The fact that the code runs in parallel when compiled with PGI and the bounds checking option turned on, but not when it is turned off seems to point with a problem with the compiler.

I believe I have located the point in the code where things go bad. When sw80 is not zero, an array with length 8 is populated with nonzero values in the subroutine where most of the action is. If I then set that array back to zero values after the array is filled, the parallel code runs normally (although with incorrect results).

When I say the code “hangs” that is not quite 100% true. I inserted a write statement in the routine where the error seems to be, and the code does seem to enter and leave that subroutine. However, things that are suppose to happen after that subroutine do not happen. The subroutine is entered a few more times, then the code actually does hang, as the “I’m in this subroutine” messages stop.

The parallel code fails even when using 1 thread (if compiled with the -mp option and O2 or O3 optimization).

I can send over the code for someone to look at.

Jerry

Hi Jerry,

Yes, please send the code to PGI Customer Service at trs@pgroup.com.

My best guess is that it’s a stack size issue and that you need to set your shell stacksize to unlimited. That’s the typically reason for run time errors with OpenMP given local arrays are put on the stack rather than the heap. Though your description doesn’t quite fit this so we’ll need to take a look to better understand why you’re getting this error.

  • Mat

Hi Mat,

I sent the codes, the input data, and hopefully a helpful README file to
the address indicated.

Increasing the stack size did not seem to help. Usually when the stack size is too small, the code crashes immediately with the ever-helpful message “segmentation fault”. In this case, the code hangs. It uses more than 100% of a CPU, and lots of RAM, just like it does when it runs properly in parallel mode.

Jerry


Hi Jerry,

The problem appears to be a bug in the compiler. I have filed a bug(#21318) with this issue. To workaround the bug you can add the following directive:

cpgi$l novector

before the loop that starts at line 27189 in lcsubs.for. The compiler is vectorizing this loop which appears to be causing the problem. This directive instructs the compiler to turn off vectorization for this loop. It would look something like:

cpgi$l novector
do 9 kk=1,8
c

Hope that helps.

-craig

Hi All,

I have just installed the version 15.5 compilers. I believe I am still encountering the same sort of compiler bug, using a slightly different code. I have several different modeling codes, and all of them call the same subroutines that do most of the work. Each modeling code implements a different optimization technique, for example a genetic algorithm, an amoeba, a differential evolution Markov chain, etc.

In my original message, I had problems with the differential evolution code where I could not get the parallel version to run with a compiler optimization level O2 or higher. The workaround suggested above fixed the problem. Now my other code that uses the parallel genetic algorithm will not run when compiled at an optimization level of O2 or higher. I get a segmentation fault when it tries to enter the parallel loop. The code does run when compiled with gfortran with level O2. As noted earlier, these two codes are mostly the same.

Is there an easy way for me to tell where the failure occurs?

Jerry

The problem noted above seems to be partly related to the stack size and other memory considerations. I made a few arrays smaller, and I was able to get the code to run in parallel. However, I found a new issue that I will post to a new thread.

Jerry

I just checked and the bug that was filed(#21318) has still not been fixed. Perhaps you are running into another instance of this problem.

I just checked and the bug that was filed(#21318) has still not been fixed. Perhaps you are running into another instance of this problem.

Possibly. In the previous instance, the code would take up CPU cycles and occupy RAM, but would not do anything else. In this case, the code would run and produce output, but would take up more RAM as it went along. I had the arrays such that it eventually exceeded the system capacity and died. When I adjusted the array sizes, the code would work, as it would not be using all of the RAM.

I have posted about the issue of more and more RAM being used in a different thread.

Jerry