OpenMPI MXM problem

I am probably making a stupid error, but I don’t really know where I should look.

This is all on RHEL 6.5.

I have previously used both HPC-X and compiled OpenMPI against libmxm (yalla driver).

HPC-X 1.3.336 works well for me.

Now I am trying to install HPC-X 1.5.370 and also compile OpenMPI 1.10.2. All efforts have resulted in

code that hangs shortly after MPI_Init(). I compile Intel IMB benchmark and run it on 2 tasks using the

yalla driver and it hangs in the first MPI_Bcast() which is the first communicating routine after the initial

setup (MPI_Init/MPI_Comm_size/MPI_Comm_rank).

If disable libmxm and use “-mca pml ob1 -mca btl openib,self,sm” the program runs correctly.

I have tried two different versions of libmxm

HPC-X 1.3.336: MXM_VERNO_STRING “3.3.3055”

HPC-X 1.5.370: MXM_VERNO_STRING “3.4.3079”

If I build OpenMPI 1.10.2 using v 3.3 of mxm I get a working implementation with yalla.

If I use HPC-X 1.3.336 everything also works fine with yalla

If I run HPC-X 1.5.370 or if I build OpenMPI 1.10.2 against the 3.4 version of mxm I get the problem.

The software installed in /opt/mellanox and related software is at the same level as HPC-X 1.5.370.

Anyone on this list that has suggestion what may be my problem and/or how to diagnose it?

Hi Nils,

I think you can try the IMB test that is included in HPC-X that is in $HPCX_MPI_TESTS_DIR

mpirun -mca pml yalla -np 2 ${HPCX_MPI_TESTS_DIR}/imb/IMB-MPI1

Could it possibly be that there are other MPI libraries in your LD_LIBRARY_PATH?

Have you tried running 2 processes on a single node, and would that work?

Hi Nils,

Turns out that you need to add this “-x MXM_OOB_FIRST_SL=0” to your mpirun on your cluster.

Otherwise if you do a pstack on a process which it is hung, you would find that the process is hung in some routine in hcoll, because hcoll used the pml for OOB messaging.

Anyway, this seems to work works for me:

$ mpirun -np 2 -host nxt0111,nxt0110 -x MXM_OOB_FIRST_SL=0 ${HPCX_MPI_TESTS_DIR}/imb/IMB-MPI1

Same behaviour using the version of IMB provided in the gcc version of HPC-X 1.5.370.

Runs fine on a single node, but hangs if scheduled across two nodes.

The LD_LIBRARY_PATH only contains pointers to HPC-X (line changes inserted by me below for reeadability

LD_LIBRARY_PATH=/lsf/9.1/linux2.6-glibc2.3-x86_64/lib: /hpc/base/ctt/packages/hpcx/1.5.370/gcc/hcoll/lib: /hpc/base/ctt/packages/hpcx/1.5.370/gcc/fca/lib: /hpc/base/ctt/packages/hpcx/1.5.370/gcc/mxm/lib: /hpc/base/ctt/packages/hpcx/1.5.370/gcc/ompi-v1.10/lib

Does the same failure if pre-compiled OpenMPI used?

Did you try to avoid LSF and run mpirun directly? Maybe allocate nodes used LSF and the from the other terminal run the MPI job.

Try to add LD_LIBRARY_PATH path to mpirun command, something like mprin -x LD_LIBRARY_PATH

Does the same failure if pre-compiled OpenMPI used?

Yes - that is what I meant by running HPC-X

Did you try to avoid LSF and run mpirun directly? Maybe allocate nodes used

LSF and the from the other terminal run the MPI job.

No I did not try that. What kind of incompatibility between LSF and libmxm

are you suggesting would demonstrate the behaviour I see?

Try to add LD_LIBRARY_PATH path to mpirun command, something like mprin -x

LD_LIBRARY_PATH

That is how I always run. It is the only way to be at least somewhat

certain what is going to be used during execution.

It is also the only way to be able to have several different runs with

different set-up queued and be able to know the

circumstances on how they were executed.