I have a Fortran (77/90) code that is originally MPI and built specifically for 32 MPI Tasks and I have added Open MP structures. Because the code is pedominantly F77 in that subroutine, the Open MP is of the form “C$OMP”.
To confirm the code has not changed I ran the same case but used OMP_NUM_THREADS=1 and the compilation used -O0 (as for the pure MPI version). Also -mp=nonuma. The compiler is 9.0.4 on a Cray XT (barcelona quad-cores)
There is a comparator tool that shows there are differences in the results.
Then I disable the Open MP using "CCC " in first four columns and re-compile and again see discrepancies. I am unhappy about that.
Next I omitted the “-mp=nonuma” and then the results are identical. Probably because the code is essentially identically compiled at that point. this is puzzling but mainly because I do not understand what -mp option is doing when the Open MP is disabled in code.
The code is large (>80000 lines in one file and 330 subroutines) but I was only attempting Open MP in one subroutine at a time (being cautious).
I had wound down the Optimization from “-fast” to reduce the complexity of this issue but it seems to still be a problem. How am I supposed to have confidence in a hybrid code is I cannot get a single threaded version to agree with the “standard” MPI run. (eventually I expect to run 64 MPi tasks with as many as 24 Open MP theads on the next generation of the machine).
Any ideas or clarification about -mp=nonuma would be appreciated.