Case Dyng after an Hour with running v16 on MPI

I installed Open Suse Leap and pgi on a new machine. I have installed version 16.10 and am using mpich 3-2. I am running on a single node with no job scheduler. When using pgfortran on 20 or 40 cores, the case dies after about an hour on step 188-190. This case ran fine on my older cluster (pgi 12 and older mpich) and I have run it several thousand steps (3 days) using gfortran with this cluster. My best guess is it is something wrong with MPI that builds up over time but no clue. I don’t get why it would have different behavior with different fortran compilers. I put some info below. Any ideas?

pgfortran 16.10-0 64-bit target on x86-64 Linux -tp haswell

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 18975 RUNNING AT falcon
= EXIT CODE: 8
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Floating point exception (signal 8)

mpiexec.hydra -version
HYDRA build details:
Version: 3.2
Release Date: Wed Nov 11 22:06:48 CST 2015
CC: pgcc
CXX: g++
F77: pgf90
F90: pgf90

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256895
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 256895
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

gcc version 4.8.5 (SUSE Linux)

Thanks,

Jeremy

There is OpenMPI that can get installed with PGI 16.10.

If it is there, use the MPI drivers for compilers.
Add $PGI/linux86-64/2016/mpi/openmpi/bin to your $PATH.

CC: mpicc
CXX: mpic++
F77:mpif90 -Mbackslash
F90: mpif90 -Mbackslash

If it works now, then I would suspect mpich 3.2.
Perhaps the proper headers and/or libraries were not used to
build the executable.

dave

I’ll try again next week. I was using MPICH to keep the same version for all three compilers for testing. It works fine with gfortran and ifort. I wish there was a better way to test if it was working than waiting an hour or two to see if the program fails.