Channel Initialization Failed Error with MPI

Hello, having a bit of an issue with MPI on OSX 10.11 and PGI 16.1 (I believe this also happens on earlier versions of osx and other versions of pgi for me as well).

I’ve compiled something, for example, parallel netcdf, and then go to run the tests. Everything compiles fine, but when I try to run the compiled executables, I get this for all of them:

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(433)..............: 
MPID_Init(176).....................: channel initialization failed
MPIDI_CH3_Init(70).................: 
MPID_nem_init(286).................: 
MPID_nem_tcp_init(108).............: 
MPID_nem_tcp_get_business_card(354): 
MPID_nem_tcp_init(297).............: 
MPIDI_GetIPInterface(116)..........: ioctl failed errno=6 - Device not configured

Done quite a bit of searching, and haven’t been able to come up with a solid direction to look in. Any advice on how to approach, or if more information is needed, what would be relevant?

I just ran a quick test on our El Capitan system here, and the MPICH we ship with PGI Workstation 16.1 for OS X appears to work fine for me:

cparrott@elcapitan ~ $ mpicc -o hello_mpi hello_mpi.c
cparrott@elcapitan ~ $ mpirun -np 2 ./hello_mpi
Process 1 says ‘Hello, world!’

HELLO_MPI - Master process:
C/MPI version
An MPI example program.

The number of processes is 2.

Process 0 says ‘Hello, world!’
Elapsed wall clock time = 0.000017 seconds.

HELLO_MPI - Master process:
Normal end of execution: ‘Goodbye, world!’

09 February 2016 01:42:36 PM

Can you verify a couple of things for me?

  1. Are you using the MPICH that shipped with PGI 16.1 on OS X? (Should be in /opt/pgi/osx86-64/2016/mpi/mpich.) Or did you compile your own version of MPICH?

  2. What command line are you using to build and run your MPI programs with MPICH?

Best regards,

+chris

Hi Chris,

I’m using the included version of mpich. I’ll use my process of building parallel netcdf. Here should be the the relevant environment variables:

PATH=/opt/pgi/osx86-64/16.1/bin:/opt/pgi/osx86-64/2015/mpi/mpich/bin:/opt/pgi/osx86-64/16.1/bin:.:/bin:/usr/bin:/sbin:/usr/sbin:/opt/X11/bin:/opt/local/bin:/usr/local/bin
MAKEFLAGS=-j 8
PGI=/opt/pgi/osx86-64/16.1
LM_LICENSE_FILE=/opt/pgi/license.dat
CC=pgcc
FC=pgf90
F90=pgf90
CPPFLAGS=-I/opt/pgi/osx86-64/2016/mpi/mpich/include
CFLAGS=-m64
CXX=
FFLAGS=-m64
F90FLAGS=-I/opt/pgi/osx86-64/2016/mpi/mpich/include
F77=pgf90
FCFLAGS=-I/opt/pgi/osx86-64/2016/mpi/mpich/include
LDFLAGS=-L/opt/pgi/osx86-64/2016/mpi/mpich/lib
MPIF90=/opt/pgi/osx86-64/2016/mpi/mpich/bin/mpif90
MPIF77=/opt/pgi/osx86-64/2016/mpi/mpich/bin/mpif77
MPICC=/opt/pgi/osx86-64/2016/mpi/mpich/bin/mpicc

Then, I build parallel netcdf with:

./configure --prefix=/usr/local/parallel-netcdf-1.5.0-pgi64 --disable-cxx
make
make install

Everything appears to compile fine, there are no noticeable errors. Then, I go to do some of the parallel netcdf tests:

cd /usr/local/src/parallel-netcdf-1.5.0-pgi64/test/F90
make clean
make
./f90tst_parallel
./f90tst_parallel2
./f90tst_parallel3
./f90tst_parallel4
./f90tst_vars
./f90tst_vars2
./f90tst_vars3
./f90tst_vars4
./test_intent
./tst_f90
./tst_f90_cdf5
./tst_flarge
./tst_io
./tst_types2

All of the tests appear to compile fine, but when they are run, each of them output:

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(474)..............: 
MPID_Init(190).....................: channel initialization failed
MPIDI_CH3_Init(89).................: 
MPID_nem_init(320).................: 
MPID_nem_tcp_init(173).............: 
MPID_nem_tcp_get_business_card(420): 
MPID_nem_tcp_init(363).............: 
MPIDI_GetIPInterface(116)..........: ioctl failed errno=6 - Device not configured
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=3715599
:
system msg for write_line failure : Bad file descriptor

Thanks for any help you can provide.

Hi,

You need to use the “mpirun” wrapper to run the Parallel netCDF tests. I am testing Parallel netCDF 1.6.1 here, which is a slightly newer version than what you have. When I run a test the same way you did, I got the same error:

cparrott@elcapitan ~/perforce/extras/pnetcdf/parallel-netcdf-1.6.1/buildosx/test
/F90 $ ./f90tst_parallel
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(474)..............:
MPID_Init(190).....................: channel initialization failed
MPIDI_CH3_Init(89).................:
MPID_nem_init(320).................:
MPID_nem_tcp_init(173).............:
MPID_nem_tcp_get_business_card(420):
MPID_nem_tcp_init(363).............:
MPIDI_GetIPInterface(116)..........: ioctl failed errno=6 - Device not configured
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=3715599
:
system msg for write_line failure : Bad file descriptor

However, when I run the test using mpirun, it works fine:

cparrott@elcapitan ~/perforce/extras/pnetcdf/parallel-netcdf-1.6.1/buildosx/test
/F90 $ mpirun -np 4 ./f90tst_parallel
*** TESTING F90 ./f90tst_parallel                                  ------ pass

Try using mpirun to run the rest of these tests, and see if it works for you.

Best regards,

+chris

Hi Chris,

It appears as though you are correct. Using mpirun as you stated results in successful parallel netcdf tests, and appears to produce working results for some other software as well.

Thanks very much for your help.