poor pgi openMP performance????

I have a fortran CFD program parallized by openMP. When it is compiled by Intel fortran, i can achieve a speedup of almost 10 on 2 Intel Xeon X5670 CPUs which containing 12 cores. But when i compile it by pgi (version 11.8), i can only achieve a speedup of less than 5. I use the two compilers with -O3 option. For the sequential program, i observe that pgi fortran is about 20% slower than inter fortran. More surprisingly, if i use -fast option of pgi compiler, i cannot get the right result with 12 openMP threads, but it is still normal when the number of threads is less than 12.
So what is the difference of implementation between intel openMP and pgi openMP??Anybody can give me some advice about how to improve pgi openMP performance ???

Answered in cross-post:poor pgi openmp performance??

  • Mat