MPICH2 installation issues

I have this problem. I have just installed PGI CDK 10.5 on our cluster. One on a single machine and one on a single machine and one on cluster. They are all x86-64 machines, using ssh.

When I test on the single-machine cluster. MPICH seemt to work, but MPICH2 only print out “Alarm clock” message.

Check MPI installation
MPICH1 first
– 32-bit
hello - I am process 0 host cardiac.bi
hello - I am process 1 host cardiac.bi
hello - I am process 2 host cardiac.bi
hello - I am process 3 host cardiac.bi
– 64-bit
hello - I am process 0 host cardiac.bi
hello - I am process 1 host cardiac.bi
hello - I am process 2 host cardiac.bi hello - I am process 3 host cardiac.bi

MPICH2 second
– 32-bit
./script_mpitest: line 35: 8969 Alarm clock mpiexec -np 4 ./mpihello_mpich2
./script_mpitest: line 37: 8970 Alarm clock mpdallexit
– 64-bit
./script_mpitest: line 48: 8986 Alarm clock mpiexec -np 4 ./mpihello_mpich2
./script_mpitest: line 50: 8987 Alarm clock mpdallexit

When I test on the other cluster. MPICH works, MPICH2 also work, but it seem to run on a single machine only. I follow the guideline on PGI CDK installation notes.

-e Check MPI installation
-e MPICH1 first
– 32-bit
hello - I am process 0 host nfat.binf.
hello - I am process 1 host dhpr
hello - I am process 2 host fkbp
hello - I am process 3 host nfkb
– 64-bit
hello - I am process 0 host nfat.binf.
hello - I am process 1 host dhpr
hello - I am process 2 host fkbp
hello - I am process 3 host nfkb
-e MPICH2 second
– 32-bit
An mpd is already running with console at /tmp/mpd2.console_minhtuan on nfat.binf.gmu.edu.
Start mpd with the -n option for a second mpd on same host.
hello - I am process 2 host nfat.binf.
hello - I am process 3 host nfat.binf.
hello - I am process 0 host nfat.binf.
hello - I am process 1 host nfat.binf.
– 64-bit
hello - I am process 0 host nfat.binf.
hello - I am process 2 host nfat.binf. hello - I am process 3 host nfat.binf.

hello - I am process 1 host nfat.binf.

Could someone give me a hint to resolve this. If you need further information, please let me know.

Tuan

On multiple nodes cluster:

If mpd already started, then you will need to exit first(mpdallexit).
It may have started with only one node.

Then create mpd.host files containing all slave nodes.
mpd.hosts:
dhpr
fkbp
nfkb

Run following commands in the directory you have created mpd.hosts.
%mpdboot --totalnum=4

#check
%mpdtrace


Running on just single node cluster:
Again make sure there is not existing(mpdallexit). Then run:

%mpd


Hongyon