OpenMPI with MXM 32bit issue

We have the problem that after 2.147 billion messages, which is the range of an int32_t, we cannot receive any more messages.

Compiling OpenMPI with the flag “–with-mxm=/path/to/mxm” causes this problem while without this flag everything is fine. The Problem is reproducible with the attached example code, by compiling and running it with the follwing commands:

$ /path/to/openmpi/bin/mpic++ openmpi_mxm_freeze.cxx -o openmpi_mxm_freeze

$ /path/to/openmpi/bin/mpirun -np 2 openmpi_mxm_freeze

Maybe the issue is connected with the following lines from “mxm_def.h”:

typedef uint32_t mxm_tag_t;/* MXM tag type /typedef uint32_t mxm_imm_t;/ MXM immediate data type */

The problem occurs with the newest Mellanox firmware, OFED package and OpenMPI version.

openmpi_mxm_freeze.cxx.zip (558 Bytes)

Hello Thomas,

Did you mean to run this test on one host or on two?

If on 2, please add ‘–map-by node’ to the command line and rerun.

I verified that it works:

$/usr/mpi/gcc/openmpi-1.10.3rc4/bin/mpirun -np 2 --map-by node --display-map -mca pml yalla exec

Data for JOB [1162,1] offset 0

======================== JOB MAP ========================

Data for node: vegas27 Num slots: 16 Max slots: 0 Num procs: 1

Process OMPI jobid: [1162,1] App: 0 Process rank: 0

Data for node: vegas28 Num slots: 16 Max slots: 0 Num procs: 1

Process OMPI jobid: [1162,1] App: 0 Process rank: 1

=============================================================

0: ready to run

1: ready to run

1: reached 2147483641 receives

1: reached 2147483642 receives

1: reached 2147483643 receives

1: reached 2147483644 receives

1: reached 2147483645 receives

1: reached 2147483646 receives

1: reached 2147483647 receives

1: reached 2147483648 receives

1: reached 2147483649 receives

1: reached 2147483650 receives

1: reached 2147483651 receives

1: reached 2147483652 receives

1: reached 2147483653 receives

1: reached 2147483654 receives

1: reached 2147483655 receives

1: reached 2147483656 receives

1: reached 2147483657 receives

1: reached 2147483658 receives

1: reached 2147483659 receives

1: reached 2147483660 receives

1: finished

0: reached 2147483641 sends

0: reached 2147483642 sends

0: reached 2147483643 sends

0: reached 2147483644 sends

0: reached 2147483645 sends

0: reached 2147483646 sends

0: reached 2147483647 sends

0: reached 2147483648 sends

0: reached 2147483649 sends

0: reached 2147483650 sends

0: reached 2147483651 sends

0: reached 2147483652 sends

0: reached 2147483653 sends

0: reached 2147483654 sends

0: reached 2147483655 sends

0: reached 2147483656 sends

0: reached 2147483657 sends

0: reached 2147483658 sends

0: reached 2147483659 sends

0: reached 2147483660 sends

0: finished

I’m checking the one host case.

Alina.

Hi Alina,

we are using Debian Jessie and the MLNX_OFED_LINUX-3.3-1.0.4.0-debian8.3-x86_64 package.

Thomas

Hi Thomas,

Yes, the issue had something to do with it but it was something internal, not in the API.

Thank you for reporting this.

Alina.

I see. I will check this and get back to you.

In the meantime can you please check if adding the following to the command line resolves the hang?

-x MXM_TLS=ud

or

-x MXM_TLS=rc

Thanks,

Alina.

Hi Alina,

we can confirm, that the issue is solved with the updated MXM version.

Was the problem connected to a 32 bit number or something completely different?

Thanks a lot

Thomas

Hello Alina,

thank you for your response. I meant the case on one host, but I will check the two host case anyway.

One part of the problem is, that although the Infiniband network is not involved in the single host case, the example does not run properly if OpenMPI is compiled with the “–with-mxm” option.

Thomas

Well, thats interesting.

The case on two hosts works fine:

$ /opt/openmpi-2.0.1-jessie-mxm-mt/bin/mpirun -np 2 -hostfile hostfile --map-by node --display-map -mca pml yalla openmpi_mxm_freeze

Data for JOB [31717,1] offset 0

======================== JOB MAP ========================

Data for node: intel1 Num slots: 1 Max slots: 0 Num procs: 1

Process OMPI jobid: [31717,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0-1]]:[BB/…/…/…/…/…/…/…/…/…/…/…][…/…/…/…/…/…/…/…/…/…/…/…]

Data for node: intel2 Num slots: 1 Max slots: 0 Num procs: 1

Process OMPI jobid: [31717,1] App: 0 Process rank: 1 Bound: socket 0[core 0[hwt 0-1]]:[BB/…/…/…/…/…/…/…/…/…/…/…][…/…/…/…/…/…/…/…/…/…/…/…]

=============================================================

[1474616276.871628] [intel1:7883 :0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 2906.98

[1474616276.903256] [intel2:3181 :0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 3043.73

0: ready to run

1: ready to run

0: finished

1: finished

while the one host case does not:

$ /opt/openmpi-2.0.1-jessie-mxm-mt/bin/mpirun -np 2 --map-by node --display-map -mca pml yalla openmpi_mxm_freeze

Data for JOB [31494,1] offset 0

======================== JOB MAP ========================

Data for node: intel1 Num slots: 24 Max slots: 0 Num procs: 2

Process OMPI jobid: [31494,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0-1]]:[BB/…/…/…/…/…/…/…/…/…/…/…][…/…/…/…/…/…/…/…/…/…/…/…]

Process OMPI jobid: [31494,1] App: 0 Process rank: 1 Bound: socket 0[core 1[hwt 0-1]]:[…/BB/…/…/…/…/…/…/…/…/…/…][…/…/…/…/…/…/…/…/…/…/…/…]

=============================================================

[1474615276.877829] [intel1:7723 :0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 2971.04

[1474615276.877833] [intel1:7724 :0] sys.c:744 MXM WARN Conflicting CPU frequencies detected, using: 2971.04

0: ready to run

1: ready to run

freeze

Since we are normally using a single host and just in extreme cases two or more hosts, a solution for the single host would be appreciated.

Both options work, while the “ud” option is significantly slower.

Hi Thomas,

I would like to provide you with an updated version of MXM which should fix the problem.

Can you please tell me what OS and Mellanox-OFED you are using?

Thank you,

Alina.

Hi Thomas,

Here is a link to an updated MXM version:

Index of /mxm/mxm http://bgate.mellanox.com/mxm/mxm/

After installation, MXM will be installed in /opt/mellanox/mxm .

The fix will be part of MXM’s January release.

Please let me know if this works well for you.

Thank you,

Alina.