In the manual “PGI® User’s Guide -Parallel Fortran, C and C++ for Scientists and Engineers” there is an example program on pp. 136-137 called “CRITICAL_USE”.
I compile this using
pgf95 -mp critical_use.F -o critical_use
and change the number of threads available using
export OMP_NUM_THREADS=
and vary the number of threads from 1 to 8. I have pgf90 6.1-1 64-bit target on x86-64 Linux, there are 8 cores available per node, sharing 20 GB of memory per node.
I do not see any speedup when varying number of processors. There is another example program in the User’s Guide called “VECTOR_OP” on p. 28 [example 2-3]. I can get a speedup in this program using the vector compile option versus the scalar version, but I cannot get any speedup in this program when using the “-Mconcur -fastsse” options and try to vary the number of OMP_NUM_THREADS.
I did find another PGI openmp example at
http://www.pgroup.com/openmpbench_dir/fftpde/
and this works as expected, i.e. I do get a speedup as I vary OMP_NUM_THREADS.
So my question really goes back to the top of this page, why am I not seeing a speedup in the “ciritical_use.F” program? Could someone else out there try it and see what they get?
This is important because I’ve got another program I’m working on where I am not seeing any speedup using openMP directives, and it’s fairly straigtforward where I just have a bunch of very trivial loops to parallelize.
Thanks in advance.