MPICH2 on single machine

I have a single machine with 2 CPU, each with 4 cores. May I assk how could I configure to get it work with multiple processes. I always get the following message

mpdboot -n 4 -f ~/mpd.hosts_cardiac
totalnum=4 numhosts=1
there are not enough hosts on which to start all processes

Thanks,
Tuan

Hi Tuan,

What’s your “machine.LINUX” file look like? Do you list your host 8 times in it or put a “:8” after the name to let the mpdboot know you have 8 cores on this system?

  • Mat

I list my hostname 8 times. I guess the mpdboot only allow each hostname to occur once.

Tuan

Hi,

For MPICH2, you need to start only one daemon per machine. Once the daemon started, then you can run multiple processes.

% mpdboot
%mpiexec -np 2 a.out
%mpdallexit



If you have more than one machine, put all hosts in mpd.hosts and then invoke:
%mpdboot --totalnum=3

where 3 is a number of hosts in mpd.hosts file.

% mpdtrace # test if all hosts list in mpd.hosts starts the daemon

Hongyon

Thanks, Hongyon.

A simple test case on a single-host cluster seem to work. However, when I add collective communication call, MPI_Bcast. It doesn’t work. The error output is

$mpdboot
minhtuan@cardiac:\ $mpiexec -n 3 ./vect
Time to initialize MPI is 0
FORTRAN STOP
FORTRAN STOP
End test
FORTRAN STOP
rank 2 in job 1 cardiac.binf_57734 caused collective abort of all ranks
exit status of rank 2: return code 0
rank 1 in job 1 cardiac.binf_57734 caused collective abort of all ranks
exit status of rank 1: return code 0
rank 0 in job 1 cardiac.binf_57734 caused collective abort of all ranks
exit status of rank 0: return code 0

The test code is given

  IMPLICIT NONE
  INCLUDE 'mpif.h'
  
  
  INTEGER :: mpierror, numprocs, myrank
  DOUBLE PRECISION :: mytime, maxtime, mintime, avgtime
  INTEGER :: count0, count1, count_max, count_rate, dtime
  LOGICAL :: flag
  INTEGER :: data
  CHARACTER (MPI_MAX_ERROR_STRING+1) :: err_msg
  
  CALL MPI_Initialized(flag, mpierror)
  IF (.NOT. flag)  THEN
     ! start timer
     CALL SYSTEM_CLOCK(count0, count_rate, count_max)
     CALL MPI_Init(mpierror)
     IF( mpierror .NE. MPI_SUCCESS) THEN
        PRINT *, "Error Init MPI"
        STOP
     END IF
     CALL MPI_Comm_Size(MPI_COMM_WORLD, numprocs, mpierror)
     IF( mpierror .NE. MPI_SUCCESS) THEN
        PRINT *, "Error Detect Communicator size"
        STOP
     END IF

     CALL MPI_Comm_Rank(MPI_COMM_WORLD, myrank, mpierror)
     IF( mpierror .NE. MPI_SUCCESS) THEN
        PRINT *, "Error Detect process rank"
        STOP
     END IF
     
     ! stop timer
     CALL SYSTEM_CLOCK(count1, count_rate, count_max)
     dtime = (count1-count0)/count_rate
     ! end timer
  END IF
  IF (myrank .EQ. 0) THEN
     PRINT *, "Time to initialize MPI is ", dtime
     ! initialize
     data = 1000
  ENDIF
  CALL MPI_Bcast(data, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, mpierror)
  IF( mpierror .NE. MPI_SUCCESS) THEN
     PRINT *, "Error Broadcast message"
     STOP
  END IF
  
  CALL MPI_Barrier(MPI_COMM_WORLD, mpierror)
  IF( mpierror .NE. MPI_SUCCESS) THEN
     PRINT *, "Error Barrier"
     STOP
  END IF
  
  IF (myrank .EQ. 0) THEN
     PRINT *, "End test"
  END IF
  STOP

Thanks,
Tuan

Remove the last “STOP” statement from your program and add a call to MPI_FINALIZE(some_integer).


Here is more information about MPI:

http://www.mcs.anl.gov/research/projects/mpi/


Hongyon

Thanks, Hongyon. I forgot to add that in the sample code, but in the real code it does have the finalization statement.

I realize one issue, the call to flush() statement (without any argument) doesn’t work properly in MPICH2. It causes the error that I mentioned above. It works if a given device ID is passed as the input. Could you please confirm this.

Thanks,
Tuan

Hi,

Yes, you will need to give the file unit as an argument.

Hongyon

thanks. Hongyon. Did you get any forwarded email from trs. Could you please check and help me to resolve my network problem when I run MPICH2 on a cluster, not a single host machine.

Tuan