WRF MPI2 runtime error - set_timekeeping fails

I am trying to get WRF to run on an Opteron cluster using MPI2 but the subroutine set_timekeeping.F throws kills the mpi…run…

A. If I run ./wrf.exe it will run

B. mpiexec -1 -n 4 /scratch/dpolzin/WRF/WRFV3/run/wrf.exe >& wrf_log.txt &
It crashes when one of the nodes gets to… set_timekeeping.F line 102

C. it must be something with the mpi2 and this subroutine. or some of the flags when I compiled it that do not let this subroutine to work…

Any ideas…



— Cluster and PGI info ----
2.6.18-53.1.14.el5

/export/apps/pgi/linux86-64/7.2-4/bin

pgf90 -V

pgf90 7.2-4 64-bit target on x86-64 Linux -tp k8-64e
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2008, STMicroelectronics, Inc. All Rights Reserved.


WRF V3.2.1 MODEL


Parent domain
ids,ide,jds,jde 0 0 0 0
ims,ime,jms,jme 0 0 0 0
ips,ipe,jps,jpe 0 -1 0 -1


DYNAMICS OPTION: Eulerian Mass Coordinate
alloc_space_field: domain 1 , 121876 bytes allocated
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE: set_timekeeping.F LINE: 102
WRFU_TimeSet(startTime) FAILED Routine returned error code = -1

application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2[cli_2]: aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2

Hi dierkp,

If I read this code correctly, the error means that “start_month” in your data set is bad

.... From share/set_timekeeping.F
      CALL WRFU_TimeSet(startTime, YY=start_year, MM=start_month, DD=start_day, &
                                   H=start_hour, M=start_minute, S=start_second,&
                                   rc=rc)
      CALL wrf_check_error( WRFU_SUCCESS, rc, &
                            'WRFU_TimeSet(startTime) FAILED', &
                            __FILE__ , &
                            __LINE__  )


.... From external/esmf_time_f90/ESMF_Time.F90 (i.e. WRFU_TimeSet) 
      IF ( PRESENT( MM ) ) THEN
!  PRINT *,'DEBUG:  ESMF_TimeSet():  MM = ',MM
        CALL timeaddmonths( time, MM, ierr )
        IF ( ierr == ESMF_FAILURE ) THEN
          IF ( PRESENT( rc ) ) THEN
            rc = ESMF_FAILURE
            RETURN
          ENDIF
        ENDIF
!  PRINT *,'DEBUG:  ESMF_TimeSet():  back from timeaddmonths'
      ENDIF

.. from external/esmf_time_f90/Meat.F90
SUBROUTINE timeaddmonths( time, MM, ierr )
  USE esmf_basemod
  USE esmf_basetimemod
  USE esmf_timemod
  USE esmf_calendarmod, only : MONTHS_PER_YEAR, monthbdys, monthbdysleap
  IMPLICIT NONE
  TYPE(ESMF_Time), INTENT(INOUT) :: time
  INTEGER, INTENT(IN) :: MM
  INTEGER, INTENT(OUT) :: ierr
  ! locals
  INTEGER :: nfeb
  ierr = ESMF_SUCCESS
!  PRINT *,'DEBUG:  BEGIN timeaddmonths()'
#if defined PLANET
!  time%basetime = time%basetime
#else
  IF ( ( MM < 1 ) .OR. ( MM > MONTHS_PER_YEAR ) ) THEN
    ierr = ESMF_FAILURE   !!!!! HERES WHERE THE ERROR OCCURS[!!!!!!
  ELSE
    IF ( nfeb(time%YR) == 29 ) THEN
      time%basetime = time%basetime + monthbdysleap(MM-1)
    ELSE
      time%basetime = time%basetime + monthbdys(MM-1)
    ENDIF
  ENDIF
#endif
END SUBROUTINE timeaddmonths

Also, this output doesn’t look correct to me. It seems that the parent domain shouldn’t be zero. But I’m not a WRF expert so it could be valid.

WRF V3.2.1 MODEL


Parent domain
ids,ide,jds,jde 0 0 0 0
ims,ime,jms,jme 0 0 0 0
ips,ipe,jps,jpe 0 -1 0 -1


My best guess is that your data set is bad or being divided up incorrectly. Exactly why, I’m not sure. While we do have a lot WRF users who read our Forms, you might want to also post your question on the WRF User Forum (http://forum.wrfforum.com/)

Hope this helps,
Mat