Invalid communicator, error stack

Hi,

I am using 19.4 version to build and run a cpp code. The MPI is 3.1.3, the one that PGI installs itself. The code compiles fine but during the run, it dumps out the following error. The code successfully ran with the 18.10 version.

Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(122): MPI_Comm_rank(comm=0x883390, rank=0x7ffc59754acc) failed
PMPI_Comm_rank(75).: Invalid communicator
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=671705093
:
system msg for write_line failure : Bad file descriptor
Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(122): MPI_Comm_rank(comm=0x883390, rank=0x7ffda962d35c) failed
PMPI_Comm_rank(75).: Invalid communicator
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=470378501
:
system msg for write_line failure : Bad file descriptor
Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(122): MPI_Comm_rank(comm=0x883390, rank=0x7ffcb2711c6c) failed
PMPI_Comm_rank(75).: Invalid communicator
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=940140549
:
system msg for write_line failure : Bad file descriptor
Fatal error in PMPI_Comm_rank: Invalid communicator, error stack:
PMPI_Comm_rank(122): MPI_Comm_rank(comm=0x883390, rank=0x7ffd751fb35c) failed
PMPI_Comm_rank(75).: Invalid communicator
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=403269637
:
system msg for write_line failure : Bad file descriptor

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[19919,1],0]
Exit code: 5

Hi shervin.s,

While I can’t be sure this what’s happening here, often this means that there’s a mismatch between the version you’re building with version of the runtime.

Can you please check that your LD_LIBRARY_PATH is set to use the libraries from the same OpenMPI that you used to build?

-Mat

Hi Mat,

Thank you for the prompt response. Actually, I checked the libraries and binaries for mismatch before posting this. Unfortunately, that is not the problem. :(

Other possibilities are that you’re using a different mpirun/mpiexec than what’s included with the one we ship or including mpi.h header file from a different install.

Again, the only time I’ve seen this error before and from what I can tell searching the web, this error typically occurs when there’s a mismatch someplace, so I’d double check that the driver (mpicxx), include directories (if you have a -I flag) are correct, mpirun, and the LD_LIBRARY_PATH all point to the same install.

If that all checks out, did the full application get rebuilt with 19.4, including libraries? Maybe there’s an old object file that was built with 18.10?

-Mat

Hi Mat,

I just wanted to let you know I fixed the bug. It was a mismatch for the impi library from pgi and apparently a sub-library from hdf5. Anyway, it is fixed now. Thanks for your help, Shervin