PGI HPF issues

  1. I’m trying to use PGI HPF generated code with MPI over InfiniBand. As per the FAQ at http://www.pgroup.com/support/link.htm

one has to update libpghpf_mpi.a so that PGI’s HPF runtime works with another MPI implementation. It asks to download mpi.c from ftp://ftp.pgroup.com/x86/3.1/linux86-patches/mpi/mpi.c However, after ftp’ing, I don’t find anything listed at the top-level directory at ftp://ftp.pgroup.com/

230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd x86
550 Failed to change directory.
ftp> ls
227 Entering Passive Mode (69,30,37,38,252,143).
150 Here comes the directory listing.
226 Transfer done (but failed to open directory).
ftp> ls
227 Entering Passive Mode (69,30,37,38,104,194).
pwd
150 Here comes the directory listing.
226 Transfer done (but failed to open directory).

  1. ‘man pghpf’ says mpich1 is the default target, yet, generated code is not linked with MPI when using default options:

$ pghpf -O3 jac.f

$ ldd ./a.out
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e32400000)
librt.so.1 => /lib64/librt.so.1 (0x0000003e33800000)
libm.so.6 => /lib64/libm.so.6 (0x0000003e31c00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003e31800000)
/lib64/ld-linux-x86-64.so.2 (0x0000003e31400000)

  1. When compiling with -Mmpi, one gets the following error:

$ pghpf -Mmpi -O3 jac.f
/opt/pgi/linux86-64/2011/mpi/mpich/lib: file not recognized: Is a directory


4. If I run,

$ pghpf -Mmpi2 -O3 jac.f

I get
/usr/bin/ld: cannot find -lfmpich
even though /opt/pgi/linux86-64/11.10/mpi/mpich/ exists

The only command that appears to link to MPI is when using -Mmpi2 and using another MPI implementation’s library:

$ pghpf -Mmpi2 -O3 jac.f -L /usr/mpi/intel/mvapich2-1.4/lib

$ ldd a.out
libfmpich.so.1.1 => /usr/mpi/intel/mvapich2-1.4/lib/libfmpich.so.1.1 (0x00002afdccef8000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e32400000)
libmpich.so.1.1 => /usr/mpi/intel/mvapich2-1.4/lib/libmpich.so.1.1 (0x00002afdcd13c000)
librt.so.1 => /lib64/librt.so.1 (0x0000003e33800000)
libm.so.6 => /lib64/libm.so.6 (0x0000003e31c00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003e31800000)
librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x0000003e32c00000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00002afdcd54d000)
libibumad.so.2 => /usr/lib64/libibumad.so.2 (0x0000003e32800000)
libimf.so => /opt/intel/Compiler/11.1/064/lib/intel64/libimf.so (0x00002afdcd75b000)
libsvml.so => /opt/intel/Compiler/11.1/064/lib/intel64/libsvml.so (0x00002afdcdaed000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003e37000000)
libintlc.so.5 => /opt/intel/Compiler/11.1/064/lib/intel64/libintlc.so.5 (0x00002afdcdd04000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003e32000000)
/lib64/ld-linux-x86-64.so.2 (0x0000003e31400000)

And when this code is run, I get

$ mpirun_rsh -np 4 -host …/hosts ./a.out -pghpf -np 4 -stat all
Fatal error in MPI_Comm_rank:
Invalid communicator, error stack:
MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x6b8b14) failed
MPI_Comm_rank(65).: Invalid communicator
Fatal error in MPI_Comm_rank:
Invalid communicator, error stack:
MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x6b8b14) failed
MPI_Comm_rank(65).: Invalid communicator
Fatal error in MPI_Comm_rank:
Invalid communicator, error stack:
MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x6b8b14) failed
MPI_Comm_rank(65).: Invalid communicator
Fatal error in MPI_Comm_rank:
Invalid communicator, error stack:
MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x6b8b14) failed
MPI_Comm_rank(65).: Invalid communicator
MPI process (rank: 0) terminated unexpectedly on node19.local
Exit code -5 signaled from node19
MPI process (rank: 1) terminated unexpectedly on node20.local
MPI process (rank: 2) terminated unexpectedly on node21.local
MPI process (rank: 3) terminated unexpectedly on node23.local

which I assume is due to PGI HPF’s MPI interface in need of an update as per (1).

Any help would be appreciated.

numenorean,

Sorry you are having problems. That 3.1 release information is very old, and
I am not sure the mpi.c files still work.

I see we have a problem in our driver adding the string
/opt/pgi/linux86-64/2011/mpi/mpich/lib more than once to the command
stream (as in -rpath /opt/pgi/linux86-64/2011/mpi/mpich/lib
/opt/pgi/linux86-64/2011/mpi/mpich/lib), causing it to fail.
I have logged this as TPR 18407.

As a practical matter, does the code you are running not have MPI
calls in it to implement the multiprocessing? If it does,
pgfortran -Mmpi
is a better way to go. HPF is a dated technology, and MPI, OpenMP,
and OpenACC are current and are supported in C and C++ as well as Fortran, and as a result, there are many more publicly available codes written that use them.

regards,
dave

Dear Dave,

The code I’m trying to compile is a HPF code, i.e., it just has directives and I expected pghpf to generate MPI for it. This is what it has.

program jacobi
integer N, m
PARAMETER (N=4096)
double precision a(N, N), b(N, N)

!HPF$ processors p(4,1)
!HPF$ template t(N,N)
!HPF$ align a(i,j) with t(i,j)
!HPF$ align b(i,j) with t(i,j)
!HPF$ distribute t(block,block) onto p

C – Initializations –

C – Jacobi –
do time = 1, 10
!HPF$ INDEPENDENT
do j = 2, N - 1
!HPF$ INDEPENDENT
do i = 2, N - 1
a(i, j) = 0.25 * (b(i - 1, j) + b(i + 1, j) + b(i, j - 1) +
*b(i, j + 1))
enddo
enddo
do j = 2, N - 1
do i = 2, N - 1
b(i, j) = a(i, j)
enddo
enddo
enddo

end

Yes, I’m aware that HPF is dated, but need to compile this to perform some comparison, and PGI’s was the only hpf compiler I could find.

Thanks.

numenorean,


I have a patch for you, once 12.1 comes out. It should be out this week.
When you get it installed, send mail to trs@pgroup.com and we will send
the corrected rc file.

regards,
dave

Thanks, Dave. I await the 12.1 release then.

Thanks. I am now able to build (with -Mmpi) and run.

$ pghpf -O3 jac.f -Mmpi -Mstats
$ /opt/pgi/linux86-64/12.1/mpi/mpich/bin/mpirun -np 8 -machinefile …/hosts ./a.out -pghpf -stat alls

Now, is it possible to make the generated code use a different MPI implementation? We have our cluster nodes interconnected with InfiniBand and would like to use MPVAICH (MPI over InfiniBand) for best performance. I assume the MPI library linked in by PGHPF would just use MPI over TCP/IP (over GigE).

I sent mail indicating the changes needed to fix pghpf are in the release, so a patch
is not required.

regards,
dave

For mvapich under the CDK, try compiling by first putting the mvapich bin directory in your
path.

export PATH=$PGI/linux86-64/12.1/bin:$PGI/linux86-64/12.1/mpi/mvapich/bin:$PATH

and then try compiling with

mpif90 foo.f -f90=pghpf -v -Wl,–allow-multiple-definition -Wl,-t

and see if the mvapich libs are linked in.

regards,
dave

It compiles and links fine with MVAPICH, but I get errors while running

$ mpif90 -f90=pghpf -O3 jac.f -Mmpi -Mstats -Wl,–allow-multiple-definition
NOTE: your trial license will expire in 12 days, 13.8 hours.
NOTE: your trial license will expire in 12 days, 13.8 hours.
NOTE: your trial license will expire in 12 days, 13.8 hours.
NOTE: your trial license will expire in 12 days, 13.8 hours.
/opt/intel/Compiler/11.1/064/lib/intel64/libimf.so: warning: warning: feupdateenv is not implemented and will always fail


$ ldd a.out
libmpich.so.1.1 => /usr/mpi/intel/mvapich2-1.4/lib/libmpich.so.1.1 (0x00002b59f5b22000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003e32400000)
librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x0000003e32c00000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003e31c00000)
libibumad.so.2 => /usr/lib64/libibumad.so.2 (0x0000003e32800000)
librt.so.1 => /lib64/librt.so.1 (0x0000003e33800000)
libfmpich.so.1.1 => /usr/mpi/intel/mvapich2-1.4/lib/libfmpich.so.1.1 (0x00002b59f5f5e000)
libm.so.6 => /lib64/libm.so.6 (0x00002b59f6177000)
libc.so.6 => /lib64/libc.so.6 (0x0000003e31800000)
libimf.so => /opt/intel/Compiler/11.1/064/lib/intel64/libimf.so (0x00002b59f63fb000)
libsvml.so => /opt/intel/Compiler/11.1/064/lib/intel64/libsvml.so (0x00002b59f678d000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003e37000000)
libintlc.so.5 => /opt/intel/Compiler/11.1/064/lib/intel64/libintlc.so.5 (0x00002b59f69a4000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003e32000000)
/lib64/ld-linux-x86-64.so.2 (0x0000003e31400000)


$ mpirun_rsh -np 4 -hostfile …/hosts ./a.out -pghpf -stat alls
Fatal error in MPI_Comm_rank:
Invalid communicator, error stack:
MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x684dcc) failed
MPI_Comm_rank(65).: Invalid communicator
Fatal error in MPI_Comm_rank:
Invalid communicator, error stack:
MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x684dcc) failed
MPI_Comm_rank(65).: Invalid communicator
Fatal error in MPI_Comm_rank:
Invalid communicator, error stack:
MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x684dcc) failed
MPI_Comm_rank(65).: Invalid communicator
Fatal error in MPI_Comm_rank:
Invalid communicator, error stack:
MPI_Comm_rank(107): MPI_Comm_rank(comm=0x5b, rank=0x684dcc) failed
MPI_Comm_rank(65).: Invalid communicator
MPI process (rank: 0) terminated unexpectedly on node19.local
Exit code -5 signaled from node19
MPI process (rank: 2) terminated unexpectedly on node23.local
MPI process (rank: 1) terminated unexpectedly on node20.local
MPI process (rank: 3) terminated unexpectedly on node24.local