I’m still searching for a way to parallelize my code; Cuda and OpenACC both had issues, so I’m looking at OpenMP (running version 13.2 of PGI Fortran on Windows 7). But the Universe isn’t making it easy on me.
If I compile my code with the following command
pgfortran -o code.exe -mp -Minfo=mp code.f90
then the code crashes before it gets to the parallel region. In fact, it crashes during variable initialization for a called subroutine (inlining doesn’t change this). If I run the code in PGI’s debugger, I get the following error message:
Signalled ACCESS_VIOLATION at 0x1400E4E4C, function _builtin_stinit
Which points me to the following line of assembly code (if this even matters):
50 pushq %rax
Further observations:
If I leave off the OpenMP tags everything works fine.
I can remove all code associated with OpenMP, including the !$omp lines themselves, and still get this behavior if I compile with the -mp flag.
I still get this error even after I’ve increased my stack size to 512MB using the (DOS) command “set OMP_STACKSIZE=512M”.
What’s going on here?
Edit: Is this even the right forum for this question? Should I have posted in “Programming and Compiling”?
It looks like a stack overflow to me given the segv occurs when pushing a value on the stack.
On Windows, you need to set the stack size at link time using the “-stack” flag.
PGI$ pgf90 -help -stack
Reading rcfile C:\PROGRA~1\PGI\win64\13.2\bin\pgf90_rc
-stack=[no]check|<reserve>|<commit>
Set stack reserve and commit sizes at link time
[no]check Disable run-time stack check
You may need to experiment with the exact size to use, but this is the setting I usually start with:
-stack=nocheck,39000000,39000000
The “nocheck” sub-option may improve your performance a bit at the cost of reserving the entire commit space upon load rather than incrementally adding it as needed during run time.
The local IT guy told me that he thought OpenMP did something weird with the stack, and after some testing I’m inclined to agree.
Using the -mp flag, I need to reserve somewhere between 500MB and 600MB for the stack in order to not get access violation errors at runtime. This is for a Monte Carlo code whose largest array is 100K elements. Admittedly, there are 13 of them, but that doesn’t add up to 500MB. Do you have any wisdom you can pass on to me?
Even once I’ve reserved enough space, I’m also seeing weirdness with how common block variables are handled with the OpenMP flag. If I use “-mp”, a particular common block variable loses its value between the main program and a called subroutine. Without the “-mp” flag, everything works as expected. Any thoughts on this also?
Quoting myself because I may have found the issue. This variable was declared private at the start of the OpenMP block and set during the OpenMP block. However, the common block sits outside that region, and so the instance of that variable in the common block is not updated. What the subroutine sees, then, is the uninitialized copy from the common block rather than the initialized copy of that thread’s OpenMP region.
Easy enough to test, but I don’t have time right now.