PGI compiled openmpi --with-psm2 failed to run on CentOS7.5

Hi,

On a CentOS7.5 based cluster with Intel Omni-Path, I have compiled openmpi-3.0.1 --with-psm2 with PGI compiler 14.6 successfully, but at it failed to run. I had no problem on the same cluster with CentOS7.4.

I have also tried combination of newest PGI v18.4 and openmpi-2.1.3, I got similar runtime error.

Both versions of openmpi --with-psm2 compiled with GNU or Intel have no problem to run.

The PGI runtime error looks like below:

[tetuser@n0 examples]$ mpirun -H n0,n1 -np 2 ./hello_c
n0.27112hfi_userinit: mmap of status page (dabbad0008030000) failed: Operation not permitted
n0.27112psmi_context_open: hfi_userinit: failed, trying again (1/3)
n0.27112hfi_userinit: assign_context command failed: Invalid argument
n0.27112psmi_context_open: hfi_userinit: failed, trying again (2/3)
n0.27112hfi_userinit: assign_context command failed: Invalid argument
n0.27112psmi_context_open: hfi_userinit: failed, trying again (3/3)
n0.27112hfi_userinit: assign_context command failed: Invalid argument
n0.27112PSM2 can’t open hfi unit: -1 (err=23)

PSM2 was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.

Error: Failure in initializing endpoint

n1.27115hfi_userinit: mmap of status page (dabbad0008030000) failed: Operation not permitted
n1.27115psmi_context_open: hfi_userinit: failed, trying again (1/3)
n1.27115hfi_userinit: assign_context command failed: Invalid argument
n1.27115psmi_context_open: hfi_userinit: failed, trying again (2/3)
n1.27115hfi_userinit: assign_context command failed: Invalid argument
n1.27115psmi_context_open: hfi_userinit: failed, trying again (3/3)
n1.27115hfi_userinit: assign_context command failed: Invalid argument
n1.27115PSM2 can’t open hfi unit: -1 (err=23)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):

PML add procs failed
–> Returned “Error” (-1) instead of “Success” (0)

*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[n0:27056] 3 more processes have sent help message help-mtl-psm2.txt / unable to open endpoint
[n0:27056] Set MCA parameter “orte_base_help_aggregate” to 0 to see all help / error messages
[n0:27056] 1 more process has sent help message help-mpi-runtime.txt / mpi_init:startup:internal-failure
[tetuser@n0 examples]$



Please help, is PGI support for psm2 broken on CentOS 7.5?

Thank you!

Limin

Hi Limin,

We don’t have a system with Omni-Path so unfortunately we can’t test this to understand what the issue is.

Note that CentOS 7.5 was released after PGI 18.4 so we don’t yet officially support CentOS 7.5. So the problem could be with CentOS 7.5 rather than OpenMPI.

Are you are able to run other non-MPI codes with PGI?

Are you able to use the OpenMPI that PGI ships with the compilers? It doesn’t use PSM2 so wont be optimized for your network, but hopefully will narrow down if the problem is with PSM2 or something else.

-Mat

Hi Mat,

I have tried to compile openmpi-3.0.1 --without-psm2 on CentOS7.5 with PGI, everything works fine.

Compiling openmpi-3.0.1 --with-psm2 on CentOS7.4 with PGI works fine too.

So the problem is really combination of --with-psm2, CentOS7.5 and PGI.

Thank you for your help!

Limin

So the problem is really combination of --with-psm2, CentOS7.5 and PGI.

At least that’s some good news, though I’m not sure how much help I can be with diagnosing what’s wrong with PSM2.

n0.27112hfi_userinit: mmap of status page (dabbad0008030000) failed: Operation not permitted

Is there a support forum for PSM2 or documentation that could shed so insight into what this error means? I tried doing a web search but couldn’t find anything.

-Mat

I am facing similar issue with openmpi-3.1.0, PGI-18.4 on RHEL 7.3.

Is there any solution or work around for this issue.

regards,
Vivek

We found that this is due to the execute bit in the GNU_STACK section of the executable (https://stackoverflow.com/questions/32730643/why-in-mmap-prot-read-equals-prot-exec).

This was triggered by an update to the hfi1 driver (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=12220267645cb7d1f3f699218e0098629e932e1f).

There are 2 workarounds for this. The first is to link with -Wl,-z,noexecstack. The second is to run execstack -c a.out to clear the execute bit.

Hi lee218llnl,

Thank you for your suggestion,

Using the work around of “link with -Wl,-z,noexecstack” I am able to get rid of the error.

I am able to run my application with openmpi-3.1.0, PGI-18.4 with PSM2.

Thanks,
Vivek

Hi,

I got rid of the error related to PSM2 after I used link flags “-Wl,-z,noexecstack”.

I was able to run a small run of 15-20 Mins but when I tried a longer run I end up with another error as mentioned below, This error is encountered after ~45 minutes of successful execution.

_*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: 0x28
[ 0] /usr/lib/gcc/x86_64-redhat-linux/4.8.5/…/…/…/…/lib64/libpthread.so.0(+0xf370)[0x7fffead6e370]
[ 1]~/vivek/SW/PGI/linux86-64/18.4/lib/libaccnprof.so(kernelMap_lookup+0x2b)[0x7fff870f0bae]
[ 2] ~/vivek/SW/PGI/linux86-64/18.4/lib/libaccnprof.so(activityCallback+0x10a)[0x7fff870f0dca]
[ 3] ~/vivek/SW/PGI/linux86-64/2018/cuda/9.0/lib64/libcupti.so(+0x17efa2)[0x7fff86aaafa2]
[ 4] ~/vivek/SW/PGI/linux86-64/2018/cuda/9.0/lib64/libcupti.so(+0x38ac39)[0x7fff86cb6c39]
[ 5] /usr/lib/gcc/x86_64-redhat-linux/4.8.5/…/…/…/…/lib64/libpthread.so.0(+0x7dc5)[0x7fffead66dc5]
[ 6] /usr/lib/gcc/x86_64-redhat-linux/4.8.5/…/…/…/…/lib64/libc.so.6(clone+0x6d)[0x7fffe9e8b73d]
*** End of error message ***

mpirun noticed that process rank 5 with PID 243513 on node r3i7n7 exited on signal 11 (Segmentation fault)._

I have not seen this error when I compiled the application without using the flags “-Wl,-z,noexecstack” and ran single-node runs, but it was not running on multi-node as mentioned in original error of this discussion.

I am using OpenMPI-3.0.0 + PGI-18.4, I have used following flags for the application compilation:
“FFLAGS = -O3 -acc -ta=nvidia,fastmath,cc70,host,time,tesla:maxregcount:64 -Minfo=accel -mcmodel=medium -Wl,-z,noexecstack”

Is there anything else that I should take care in this scenario?

Thanks in advance
~Vivek

Hi Vivek,

The segv seems unrelated to the driver issue. Can you try removing the "time"sub-option? This enables the simple device profiler and given the segv is in the profiler libraries, my guess at the larger size you’re overflowing the profiler buffers.

-Mat

The use of noexecstack does, indeed, mitigate the issue in some cases. However, in one particular case I’ve encountered:

  • Open MPI (2.1.x - 3.x versions)
  • Portland Group compilers (17, 18)

Recompiling a simple MPI program with -Wl,-z,noexecstack addressed the mapping of the HFI capabilities pages, but the program died with a segmentation fault shortly after worker 0 started:

Program terminated with signal 11, Segmentation fault.
#0  0x00002b2f8d1419d8 in ompi_mtl_psm2_progress () at ./mtl_psm2.c:426
426	        completed++;
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.168-8.el7.x86_64 elfutils-libs-0.168-8.el7.x86_64 glibc-2.17-196.el7_4.2.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-9.el7.x86_64 libibverbs-13-7.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 libpsm2-10.3.35-1.x86_64 librdmacm-13-7.el7.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64 systemd-libs-219-42.el7_4.10.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00002b2f8d1419d8 in ompi_mtl_psm2_progress () at ./mtl_psm2.c:426
#1  0x00002b2f79ebab05 in opal_progress () at runtime/opal_progress.c:227
#2  0x00002b2f8c90220d in ompi_request_wait_completion () at ../../../../ompi/request/request.h:412
#3  0x00002b2f8c9008cc in mca_pml_cm_recv () at ./pml_cm.h:213
#4  0x00002b2f78356b2a in PMPI_Recv () at ./precv.c:79
#5  0x0000000000401583 in main (argc=2, argv=0x7ffc9a98f3b8) at /home/1001/sw/mpibounce_2/mpibounce.c:100

The segfault did not vary w.r.t. the version of Open MPI under the program: the segfault always occurred at the increment of completed which follows a call to psm2_mq_test2(). Since completed is a local variable (on the stack) the PSM2 library must be doing something to the stack that is in conflict with the treatment of the stack emitted by the PGI compiler (which compiled the code surrounding the call to psm2_mq_test2()).

This made me recall another issue we’d encountered: Gaussian Inc’s use of -tp nehalem and the PGI 18 compiler produced code that was numerically unstable on Skylake processors for certain inputs. Altering to -tp haswell seemed to address the issue, indicating that certain Nehalem-era optimizations must no longer be 100% compatible on newer processors. On our Broadwell cluster PGI defaults to using -tp haswell when no explicit option is provided, which is how Open MPI was being built. With Broadwell being a tock up from Haswell, Portland’s expectation must have been that any Haswell optimizations would work on Broadwell: they skip from haswell to skylake with the -tp option. Perhaps this is NOT the case. To test that theory:

  • rebuilt Open MPI 3.1.2 using PGI and forced the -tp px option for only the most basic optimizations
  • rebuilt my simple MPI program using that Open MPI version, also using -tp px -Wl,-z,noexecstack for it

This combination of Portland compiler, Open MPI, and PSM2 does NOT fail to map the HFI capabilities AND does not segfault. This naturally calls into question what level of PGI processor optimization is 100% reliable on a Broadwell system.