OpenMPI + OpenMP problem

chlskawo12 · September 27, 2016, 11:34am

Hello.

I’m struggling with OpenMPI.

We have three linux clusters and one is relatively new.

So the versions of OpenMPI installed in each cluster are also different.

The problem is, the behavior of OpenMPI also differs between clusters.

Our program couples OpenMP and MPI.

Each node takes one process, and in every node OpenMP does multithreading.

On old cluster with 1.4.3, following command works well.

mpirun -np 6 -machinefile mf4 ./nTracerV100 Pb3_3D_26G.inp > log_Pb3_26G

It generates 6 processes on 6 nodes and on each node 16 threads are distributed to 16 cores.

But on new cluster with 1.8.3, that command doesn’t work well.

It generates all processes on one node, and each process launches all threads in one core.

Although I add -pernode command, OpenMP still launches 16 threads in one core only.

Since I’m using exactly same program and command, it seems that this problem is coming from version difference.

Can anyone give some advices on what I should check for?

One thing I know about the difference between two versions is that default binding is set to none in older versions than 1.8.

But adding -bind-to none gives some errors like non-zero exit code.

tull · September 27, 2016, 2:55pm

Make sure you are setting
-mp

when you link as well as compile.
If not, you could be linking dummy openmp
routines that do not distribute work across threads.

Also make sure that the MPI node list is being used,
and OpenMPI is not running on a single node because all
the other nodes are unknown to mpirun.

dave

chlskawo12 · September 27, 2016, 3:05pm

The machine file mf4 contains the node list.

And OpenMP is generating multiple threads but they all run on same core.

With top -H I could check that the threads are running, each with CPU occupancy around ~6.7%.

Also, if I run the code without MPI, OpenMP works fine.

JeromeOrosz71787 · September 27, 2016, 3:59pm

A while back I managed to get openMP combined with openMPI:

Each computer has the same version of PGI, and the openMPI was built in the same way on all hosts.
Each computer has directory tree where the input files are, for example /home/orosz/dir1/. Each computer has copies of all of the input configuration files.
The code is compiled on each host, each with -mp.
These environment variables are set:

setenv OMP_NUM_THREADS 4
setenv OMP_STACKSIZE 12000000
setenv OMP_SCHEDULE dynamic
limit stacksize unlimited

4a. The code can read an optional file that sets OMP_NUM_THREADS and the starting and ending indices of the main parallel loop. Thus one host can launch 40 threads and another host 8 threads.

The MPI job is launched like this:

mpirun --bind-to none -np 4 -host hostname1,hostname2,hostname3,hostname4 -x OMP_NUM_THREADS -x OMP_STACKSIZE -x OMP_SCHEDULE codename_MPI >& screenout

It sounds like you might be having problems with the thread binding?

tull · September 27, 2016, 6:43pm

I sent you a program to determine if threads are running on different cores.

If I understand this thread, you are building an executable
‘foo’ on one system, and running the
executable on 3 different systems, and it is failing when the OpenMPI
running on the system does not match the version the executable
was linked with?

Are the MPI files linked statically, or are they dynamically linked at runtime?
If the latter, look at the output of

ldd foo

on each cluster.

The second question is whether the OpenMPI that is included with the
PGI products has been used - are you using the mpif90 in the
$PGI/linux86-64/2016/mpi/openmpi/bin directory? If you do that,
do things work as expected?

If you build OpenMPI for each of the versions of OpenMPI with
PGI compilers, and you use the mpi drivers (mpif90, mpic++,etc)
created by that build, to build the executable, does it run?

dave

JeromeOrosz71787 · September 27, 2016, 7:06pm

Hi Dave,

I have no problems with running combined openMP and openMPI codes. I was responding to CNJ above. It sounded to me CNJ was having a problem I had early on, where openMP would not launch properly on individual nodes. I ended up having to use the --bind-to none flag to get everything working.

Jerry

chlskawo12 · September 28, 2016, 2:58am

Thanks for your advices.

I could solve the problem by

-map-by node:PE=16

Topic		Replies	Views
OpenMP and magically evil performance results Legacy PGI Compilers	4	8096	July 1, 2014
OpenMP problem - OMP_NUM_THREADS greater than available cpus Legacy PGI Compilers	2	5137	July 31, 2006
openmp CUDA Programming and Performance	0	1957	December 29, 2009
openmp does not perform as expected Legacy PGI Compilers	3	17731	August 8, 2008
about PGI OpenMP Legacy PGI Compilers	1	2085	July 11, 2012
Thread bindings with OpenACCx86 and MPI Legacy PGI Compilers	3	4925	November 15, 2017
OpenMP thread affinity Legacy PGI Compilers	12	16970	December 2, 2010
Performance with hybrid setup Legacy PGI Compilers	6	1042	March 18, 2022
Multi-Threaded computation with OpenMP Legacy PGI Compilers	12	4697	June 11, 2018
How to control which processors are being used Legacy PGI Compilers	3	13225	June 2, 2005

OpenMPI + OpenMP problem

Related topics