I have a single machine with 2 CPU, each with 4 cores. May I assk how could I configure to get it work with multiple processes. I always get the following message
mpdboot -n 4 -f ~/mpd.hosts_cardiac
totalnum=4 numhosts=1
there are not enough hosts on which to start all processes
What’s your “machine.LINUX” file look like? Do you list your host 8 times in it or put a “:8” after the name to let the mpdboot know you have 8 cores on this system?
A simple test case on a single-host cluster seem to work. However, when I add collective communication call, MPI_Bcast. It doesn’t work. The error output is
$mpdboot
minhtuan@cardiac:\ $mpiexec -n 3 ./vect
Time to initialize MPI is 0
FORTRAN STOP
FORTRAN STOP
End test
FORTRAN STOP
rank 2 in job 1 cardiac.binf_57734 caused collective abort of all ranks
exit status of rank 2: return code 0
rank 1 in job 1 cardiac.binf_57734 caused collective abort of all ranks
exit status of rank 1: return code 0
rank 0 in job 1 cardiac.binf_57734 caused collective abort of all ranks
exit status of rank 0: return code 0
The test code is given
IMPLICIT NONE
INCLUDE 'mpif.h'
INTEGER :: mpierror, numprocs, myrank
DOUBLE PRECISION :: mytime, maxtime, mintime, avgtime
INTEGER :: count0, count1, count_max, count_rate, dtime
LOGICAL :: flag
INTEGER :: data
CHARACTER (MPI_MAX_ERROR_STRING+1) :: err_msg
CALL MPI_Initialized(flag, mpierror)
IF (.NOT. flag) THEN
! start timer
CALL SYSTEM_CLOCK(count0, count_rate, count_max)
CALL MPI_Init(mpierror)
IF( mpierror .NE. MPI_SUCCESS) THEN
PRINT *, "Error Init MPI"
STOP
END IF
CALL MPI_Comm_Size(MPI_COMM_WORLD, numprocs, mpierror)
IF( mpierror .NE. MPI_SUCCESS) THEN
PRINT *, "Error Detect Communicator size"
STOP
END IF
CALL MPI_Comm_Rank(MPI_COMM_WORLD, myrank, mpierror)
IF( mpierror .NE. MPI_SUCCESS) THEN
PRINT *, "Error Detect process rank"
STOP
END IF
! stop timer
CALL SYSTEM_CLOCK(count1, count_rate, count_max)
dtime = (count1-count0)/count_rate
! end timer
END IF
IF (myrank .EQ. 0) THEN
PRINT *, "Time to initialize MPI is ", dtime
! initialize
data = 1000
ENDIF
CALL MPI_Bcast(data, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, mpierror)
IF( mpierror .NE. MPI_SUCCESS) THEN
PRINT *, "Error Broadcast message"
STOP
END IF
CALL MPI_Barrier(MPI_COMM_WORLD, mpierror)
IF( mpierror .NE. MPI_SUCCESS) THEN
PRINT *, "Error Barrier"
STOP
END IF
IF (myrank .EQ. 0) THEN
PRINT *, "End test"
END IF
STOP
Thanks, Hongyon. I forgot to add that in the sample code, but in the real code it does have the finalization statement.
I realize one issue, the call to flush() statement (without any argument) doesn’t work properly in MPICH2. It causes the error that I mentioned above. It works if a given device ID is passed as the input. Could you please confirm this.
thanks. Hongyon. Did you get any forwarded email from trs. Could you please check and help me to resolve my network problem when I run MPICH2 on a cluster, not a single host machine.