I am having a problem with my large modeling code. This problem may be related to problems I have had earlier. This code runs in parallel using openMP. The number of computations done and the run time depend strongly on the specific data set being modeled. This code has been running fine on a small data set (e.g. fast runtime), both in serial mode and parallel mode.
Recently I tried to run the code on a much larger data set. The code run in serial mode with no problems (other than being relatively slow). However, in parallel model the code hangs relatively early on. On occasion it will report this error:
Error: _mp_pcpu_reset: lost thread
If I compile the code with gfortran, it runs on the larger data set.
I have tried some basic tests, like using the -Mbounds flag, but so far nothing comes up. I have also tried using the debugger, but without much success. The code would hang, but I could not see a way to see anything useful about the state of the variables, etc.
I am using PGI version 16.5, and doing my tests on a Xeon system [Intel® Xeon® CPU E5-1650 v3 @ 3.50GHz].
Since the code runs in serial mode, one would assume the troubles are related to the stack. I have tried altering the stacksize, but that does not seem to help. Does anyone have any suggestions on what else I can do to figure this out?