problem with parallel code/openMP

Hi All,

I am having a problem with my large modeling code. This problem may be related to problems I have had earlier. This code runs in parallel using openMP. The number of computations done and the run time depend strongly on the specific data set being modeled. This code has been running fine on a small data set (e.g. fast runtime), both in serial mode and parallel mode.

Recently I tried to run the code on a much larger data set. The code run in serial mode with no problems (other than being relatively slow). However, in parallel model the code hangs relatively early on. On occasion it will report this error:

Error: _mp_pcpu_reset: lost thread

If I compile the code with gfortran, it runs on the larger data set.

I have tried some basic tests, like using the -Mbounds flag, but so far nothing comes up. I have also tried using the debugger, but without much success. The code would hang, but I could not see a way to see anything useful about the state of the variables, etc.

I am using PGI version 16.5, and doing my tests on a Xeon system [Intel® Xeon® CPU E5-1650 v3 @ 3.50GHz].

Since the code runs in serial mode, one would assume the troubles are related to the stack. I have tried altering the stacksize, but that does not seem to help. Does anyone have any suggestions on what else I can do to figure this out?



Try adding


in case the issue is related calculating memory addresses from array indices.

We could be doing the calculation in 32-bit integer format, rather than
64-bit integer format. Multiply dimensioned arrays can have indices
that are always less than 32-bit, but the product of the indices is larger
than 32-bits.

The other issue could be related to initialization. If variables are assigned
from the stack rather than static memory, they can start out being garbage,
while other compilers who use static, start out being always zero.

try compiling code with gfortran using -Wuninitialized

gfortran -c -Wuninitialized foo.f90

and that switch will look for local variables that are read before
ever being written to.


Hi Dave,

I found out what causes this error:

Error: _mp_pcpu_reset: lost thread

There is a point in the code where a counter exceeds the dimensions an array. I check for this and use the FORTRAN STOP command when that happens. So the parallel code encountered this condition at a point where the serial code did not. I did not notice this at first since there is a lot of screen output, especially in the parallel mode.

I will try to look for the uninitialized variables using that gfortran command.



Hi Dave,

I forgot to ask: I use the -mcmodel=large flag. How is the -Mlarge_arrays flag different?


-mcmodel dictates the format of the object files.
-mcmodel=medium essentially creates data references with >32-bit offsets,
by using extra space in the program area, and extra cycles to calculate
the address from the larger offset. extra time spent on every reference,
does not help performance.

-mcmodel=large is not supported (program and data regions use

32-bit addresses) but -mcmodel=medium is (data region > 32-bit addresses)

-mcmodel=medium should be avoided, if possible. Large arrays can be
created at runtime via allocate, and will be supported in the default
-mcmodel=small environment. More compiler optimizations are
available as a result. large arrays can be passed to subroutines via
dummy arrays, and as long as any index variables remain 64-bit integers,
the code should work with large and ‘small’ arrays.

subroutine foo(a,b,n)
integer8 n
8 a(n),b(n)

-Mlarge_arrays tells the compilers to use 64-bit integer math to calculate
array-index-to-address calculations, and this is needed for large arrays whether allocated or statically assigned. You need it for arrays where
the product of the dimensionsbyte_size is >32-bits, like
8 a(20000,20000)


Hi Dave,

When I don’t use the mcmodel=medium flag for the PGI compiler, I get errors like this:

relocation truncated to fit: R_X86_64_PC32 against symbol realatm6_' defined in COMMON section in /tmp/pgf90lLPdHOVZIHZN.o relocation truncated to fit: R_X86_64_32S against symbol realatm4_’ defined in COMMON section in /tmp/pgf90lLPdHOVZIHZN.o
additional relocation overflows omitted from the output

The only compiler flag I used was -Mextend.

Here is what the common blocks look like:

          parameter (maxlines=1300,maxmu=115)                                   
          dimension atmT(maxlines),atmg(maxlines),atmmu(maxlines,maxmu),
     %       Nmu(maxlines)
          dimension atmint1(maxlines,maxmu),atmint2(maxlines,maxmu)
          dimension atmint3(maxlines,maxmu),atmint4(maxlines,maxmu)
          dimension atmint5(maxlines,maxmu),atmint6(maxlines,maxmu)
          dimension atmint7(maxlines,maxmu),atmint8(maxlines,maxmu)
          common /realatm1/ atmT,atmg,atmmu
          common /realatm2/ atmint1,atmint2
          common /realatm3/ atmint3,atmint4
          common /realatm4/ atmint5,atmint6
          common /realatm5/ atmint7,atmint8
          common /realatm6/ Tmax,Tmin,gmax,gmin

gfortran does not give any similar errors.

The entire code is large, and about 700 MB of RAM is used during runtime. Downstream I have a three dimensional array that has dimensions 6,60,300000, and several other two dimensional ones with dimensions 300000 by 30.

Is there an easy way to figure out why I get the “relocation truncated to fit” errors?