openMP example programs not working as expected

haferman · July 25, 2007, 12:54am

In the manual “PGI® User’s Guide -Parallel Fortran, C and C++ for Scientists and Engineers” there is an example program on pp. 136-137 called “CRITICAL_USE”.

I compile this using
pgf95 -mp critical_use.F -o critical_use
and change the number of threads available using
export OMP_NUM_THREADS=
and vary the number of threads from 1 to 8. I have pgf90 6.1-1 64-bit target on x86-64 Linux, there are 8 cores available per node, sharing 20 GB of memory per node.

I do not see any speedup when varying number of processors. There is another example program in the User’s Guide called “VECTOR_OP” on p. 28 [example 2-3]. I can get a speedup in this program using the vector compile option versus the scalar version, but I cannot get any speedup in this program when using the “-Mconcur -fastsse” options and try to vary the number of OMP_NUM_THREADS.

I did find another PGI openmp example at
http://www.pgroup.com/openmpbench_dir/fftpde/
and this works as expected, i.e. I do get a speedup as I vary OMP_NUM_THREADS.

So my question really goes back to the top of this page, why am I not seeing a speedup in the “ciritical_use.F” program? Could someone else out there try it and see what they get?

This is important because I’ve got another program I’m working on where I am not seeing any speedup using openMP directives, and it’s fairly straigtforward where I just have a bunch of very trivial loops to parallelize.

Thanks in advance.

MatColgrove · July 26, 2007, 12:53am

Hi haferman,

So my question really goes back to the top of this page, why am I not seeing a speedup in the “ciritical_use.F” program? Could someone else out there try it and see what they get?

How are you measuring the runtime? The example not meant to show parallel speed-up, rather how critical sections can be used.

Mat

haferman · July 26, 2007, 10:45pm

I’m simply measuring wall clock time using the “time” command.

MatColgrove · July 27, 2007, 8:59pm

I ask because, for me at least, the example takes less than a second, even with 1 thread. Have you modified it so that it will run longer?

Mat

haferman · July 28, 2007, 1:25am

Takes about 3 seconds for me independent of the number of threads. I put another loop (k = 1,10) outside of the !OMP directives, and takes 24 seconds with 1 thread, speeds up to 20 seconds with 4 threads… doesn’t give me the warm fuzzies that the work is being split up efficiently…

On the other hand, the example at http://www.pgroup.com/openmpbench_dir/fftpde/
takes about 3 seconds with 1 thread, 2 seconds with 2 threads, 1.5 seconds with 3 threads, 1 second with 4 threads, so even though it runs fast to begin with, the speedup is obvious…

Topic		Replies	Views
OpenMP compiling Legacy PGI Compilers	4	5251	March 9, 2006
openmp does not perform as expected Legacy PGI Compilers	3	17682	August 8, 2008
[SOLVED] Non-working OpenMP Legacy PGI Compilers	7	5232	May 7, 2013
OpenMP problem - OMP_NUM_THREADS greater than available cpus Legacy PGI Compilers	2	5093	July 31, 2006
poor pgi openMP performance???? Legacy PGI Compilers	1	7053	May 31, 2012
Multi-Threaded computation with OpenMP Legacy PGI Compilers	12	4608	June 11, 2018
OpenMP - high system usage Legacy PGI Compilers	4	16760	October 14, 2004
cannot quite use openmp (PGI 6.0 Workstation) Legacy PGI Compilers	3	5452	July 31, 2008
poor pgi openmp performance?? Legacy PGI Compilers	17	20368	August 3, 2012
Fortran with OpenMP almost no speedup Legacy PGI Compilers	15	12633	August 20, 2014

openMP example programs not working as expected

Related topics