I have installed the latest PGI version (15.5), and I noticed a problem when using an Intel box:
model name : Intel® Xeon® CPU E5-2699 v3 @ 2.30GHz
This compile command
pgf90 -mp -O3 -mcmodel=medium -Mextend -o fred_par fred.for
produces an executable that gives the wrong results when run in parallel. If I adjust the optimization level to O1 the program works in parallel.
However, this compile command
pgf90 -mp -O3 -tp=p7 -mcmodel=medium -Mextend -o fred_par fred.for
makes an executable that gives the correct results in parallel mode.
Compiling without the -mp flag and O3 optimization, as in
pgf90 -O3 -mcmodel=medium -Mextend -o fred fred.for
seems to give the correct results when run on a single CPU.
There is a parallel loop that performs an initial set-up, and this seems to run the same way regardless of the optimization level or the target architecture. However on the second parallel loop (where basically everything else happens until the code terminates) things go off the rails with -mp -O3 and no -tp specified.
We also have PGI 15.5 installed on an Opteron system:
model name : AMD Opteron™ Processor 6320
and as far as I can tell, I get good results in parallel using the -O3 flag and no -tp option.