MM5.MPP error messages

Hellow

I’m using PGI 10.5 with mpich2 to build the MM5 model on a Linux cluster (centos, 64-bit).
I succesfully configure mpich2 using 64 bits (As I think, I am not experienced). And when I make a code (type make mpp), the compilation was done with some warnings.

the configure.user in mm5:

#-----------------------------------------------------------------------------
# 4. General commands
#-----------------------------------------------------------------------------
AR = ar ru
RM = rm -f
RM_CMD = $(RM) *.CKP *.ln *.BAK *.bak *.o *.i core errs ,* *~ *.a \
.emacs_* tags TAGS make.log MakeOut *.f !
GREP = grep -s
CC = cc
FC = pgf90 
#-----------------------------------------------------------------------------
# 7. MPP options
#-----------------------------------------------------------------------------
MPP_LAYER=RSL
PROCMIN_NS = 1
PROCMIN_EW = 1
ASSUME_HOMOGENEOUS_ENVIRONMENT = 1
#-----------------------------------------------------------------------------
#   7g1. Linux PCs.  Need Portland Group pgf77 and MPICH.
#-----------------------------------------------------------------------------
RUNTIME_SYSTEM = "linux"
MPP_TARGET=$(RUNTIME_SYSTEM)
## edit the following definition for your system
LINUX_MPIHOME = /opt/mpich2/pgi
MFC = $(LINUX_MPIHOME)/bin/mpif90
MCC = $(LINUX_MPIHOME)/bin/mpicc
MLD = $(LINUX_MPIHOME)/bin/mpif90
FCFLAGS = -I/path/to/mm5/include -I$(LIBINCLUDE) -byteswapio -Mnosgimp -fastsse -Mcray=pointer -mp -DDEC_ALPHA 
LDOPTIONS = -byteswapio -fastsse -Mcray=pointer -mp
LOCAL_LIBRARIES = -L$(LINUX_MPIHOME)/build/LINUX/ch_p4/lib -lfmpich -lmpich
MAKE = make -i -r
AWK = awk
SED = sed
CAT = cat
CUT = cut
EXPAND = expand
M4 = m4
CPP = /lib/cpp -C -P -traditional
CPPFLAGS = -DMPI -Dlinux -DSYSTEM_CALL_OK
CFLAGS = -DMPI -I$(LINUX_MPIHOME)/include
ARCH_OBJS =  milliclock.o
IWORDSIZE = 4
RWORDSIZE = 4
LWORDSIZE = 4

Warnings:

PGC-W-0156-Type not specified, ‘int’ assumed (domain_def.c: 193)
PGC-W-0155-Long value is passed to a nonprototyped function - argument #3 (domain_def.c: 334)
PGC-W-0155-Long value is passed to a nonprototyped function - argument #3 (domain_def.c: 335)


ulimit -s unlimited

Finally, I run the mm5.mpp:

mpirun machinefile.LINUX -np 32 mm5.mpp

The error messages are here:

node01 – rsl_nproc_all 32, rsl_myproc 9

:

node01 – rsl_nproc_all 32, rsl_myproc 21

rank 5 in job2 node01_39081 caused collective abort of all rank

exit status of rank 5 : killed by signal 11

rank 4 in job2 node01_39081 caused collective abort of all rank

exit status of rank 4 : killed by signal 11

rank 3 in job2 node01_39081 caused collective abort of all rank

exit status of rank 3 : killed by signal 11


What’s wrong??
I don’t have any clues…
Can anyone help me?
Any comments will greatly appreciate.

Thank you.

Hi happyez,

Everytime I’ve seen a MM5 seg fault it’s due to a stack overflow. While you do set ‘ulimit -s unlimited’, this would be only set in your current environment, not propagated to each MPI process.

To fix, either add ’ ulimit -s unlimited’ to your shell’s rc file (.cshrc,.bashrc, etc.) or write a wrapper script which sets the necessary environment and then run the application. (mpirun would then run the wrapper script.)

  • Mat