Do you have any suggestion if I can get it faster through using some options inPGI fortran?
Not knowing anything about your program, I’d recommend “-fast -Mfprelaxed -Mipa=fast,inline”. Intel uses relaxed precision by default at higher optimizations but you need to explicitly add it for PGI (we’re a bit more conservative regarding accuracy). IPA may or may not help, but worth a try.
Other options to try are (please refer to PGI docs for more detail about each optimization)
Vectoriztion sub-options: Try partial vectorization (-Mvect=partial), 256-SIMD if you’re on a hawsell or piledriver architecture (-Mvect=simd:256), and removing altcode generation (-Mvect=noaltcode).
Unrolling factors: -Munrol=n: to control the loop unroll factor.
Inlining: review the compiler feedback messages from “-Minfo” and see if any routines are not getting inlined. You can try using the IPA inline suboptions to get more routines to inlines such as “-Mipa=inline:reshape” if you’re passing in sub-arrays or “-Mipa=inline:levels:10” to increase the number call levels to inline (at the cost of code size).
Beyond this, I’d profile your code, discover the hotspots, then determine what could be preventing optimization (the -Minfo option helps here).