I have this very long FORTRAN 77 code (close to 100,000 lines) that I have recently made parallel using openMP. It reads a configuration file with several real and integer parameters, reads in data files, and runs very long simulations.
I have been using PGI version 14.9 on an Intel Xeon E5-2680 (v2) system with 40 threads without any problems. I recently expanded the input file to have 10 more real*8 parameters and 10 more integer parameters. Their default values are all zeros. This code worked in serial and parallel mode. The other day I set the first new parameter (call it sw80) to a non-zero value. The code would hang when running in parallel, but would work in serial mode.
The code runs in parallel when compiled with gfortran. I did a lot of different experiments with different compilations, and here is a summary:
command: pgf90 -Mextend -mcmodel=medium -O2 -o fred_pgi fred.for
result: runs when sw80 is both zero and non-zero
command: pgf90 -Mextend -mcmodel=medium -O2 -mp -o fred_pgi_par fred.for
result: runs when sw80=0.0, hangs otherwise
command: pgf90 -Mextend -mcmodel=medium -O2 -mp -Mbounds -o fred_pgi_par_bounds fred.for
result: runs in parallel (but slowly) when sw80 is zero and nonzero
command: pgf90 -Mextend -mcmodel=medium -O1 -mp -o fred_pgi_par_O1 fred.for:
result: runs in parallel for all values of sw80
command: gfortran -ffixed-line-length-132 -O2 -mcmodel -o fred_gfort fred.for
result: runs for all values of sw80, gives same results as fred_pgi
command: gfortran -ffixed-line-length-132 -O2 -mcmodel=medium -fopenmp -o fred_gfort_par fred.for
result: runs in parallel for all values of sw80.
The code seems to run fine in serial mode, whether it is compiled using PGI or gfortran. I can use -O3 for PGI and -O2 for gfortran and get similar results (gfortran messes up when I try to use -O3). I used the bounds-check option and did not find any instance of an out-of-bounds array index. It also runs in parallel mode when compiled with gfortran. The fact that the code runs in parallel when compiled with PGI and the bounds checking option turned on, but not when it is turned off seems to point with a problem with the compiler.
I believe I have located the point in the code where things go bad. When sw80 is not zero, an array with length 8 is populated with nonzero values in the subroutine where most of the action is. If I then set that array back to zero values after the array is filled, the parallel code runs normally (although with incorrect results).
When I say the code “hangs” that is not quite 100% true. I inserted a write statement in the routine where the error seems to be, and the code does seem to enter and leave that subroutine. However, things that are suppose to happen after that subroutine do not happen. The subroutine is entered a few more times, then the code actually does hang, as the “I’m in this subroutine” messages stop.
The parallel code fails even when using 1 thread (if compiled with the -mp option and O2 or O3 optimization).
I can send over the code for someone to look at.