machine file for MPICH-GM

Hi,

Does anyone can tell me how to use the machine file in MPICH-GM ?
MPICH-GM is not under my home directory. It is under, /opt/mpich-gm.
I was specifying m/c file names under :
/opt/mpich-gm/share/machines.ch_gm.LINUX. Am I right ?

Do I need MPICH-GM under my home directory?

Thanks
Sandip

Hi Sandip,

If you do not specify the machine name then the default would be to use /opt/mpich-gm/share/machines.ch_gm.LINUX. However, a machine file name could be anything and put in any directory. To specify a machine file other than the default, at runtime add the mpirun flag “-machinefile ” where is your machine file name. The Myricom README-mpich-gm might help if you need help on creating a machine file.

Is there a specific error you’re encountering?

  • Mat

Hi Mat,

Thanks for your suggestion. Earlier I mixed up with 32 and 64 bit machines.
I’ve one more question,
I wish to run my code both on 32 bit and 64 bit machins. I’ve compiled my code on both types of architecture (32 and 64 bits) and I’ve two types of ouput binaries (32 bit and 64 bit).
I’m using MPICH-GM with pgf90 compiler.
Is the mpirun command same for both types of architecture ? I’m using mpirun.ch_gm for 32 bits.
Thanks.

Sandip

Hi Sandip,

I believe it shouldn’t matter which mpirun you use since the script simply launches your application. However, users sometimes will wrap their executable in a script which sets evironment varaibles, such as LD_LIBRARY_PATH. If you do this, then be sure the correct environment is used for each binary.

  • Mat

Mat,
Since I’m using MPICH_GM, should I use mpi.h as lib file or is there a different lib file for MPICH_GM?
Am I doing something wrong?

Thanks.
Sandip

Hi Sandip,

While I’ve only use MPICH-GM a few times, I believe the “mpi.h” header file is common to all MPICH flavors having the same version. However, I would use the “mpi.h” header file that accompanies the MPICH-GM libraries that your using just in case.

Is there are specific problem you encountering?

  • Mat

Mat,

Could you help me to resolve the problem.
PGFIO-F-231/formatted read/unit=40/error on data conversion.
File name = DATA/Par_file formatted, sequential access record = 32
In source file read_parameter_file.f90, at line number 144

And this the portion of the read_parameter_file.90
read(IIN,)
read(IIN,
)
read(IIN,4) junk,MODEL
read(IIN,)
read(IIN,
) ---------------------------------------> line number 144
read(IIN,3) junk,OCEANS
read(IIN,3) junk,TOPOGRAPHY
read(IIN,3) junk,ATTENUATION
read(IIN,3) junk,USE_OLSEN_ATTENUATION

Thanks.
Sandip

Hi Sandip,

This error occurs when your trying to read in data that is not compatible with the variable’s data type you’re reading it into. Since “read(IIN, *)” basically says to skip to the next record, which in your case should be the next line, I doubt this is causing the problem. More likely it’s the line “read(IIN,3) junk,OCEANS”. For example, if OCEANS is an INTEGER and your trying to read in a string.

What data is your program is trying to read in at line 32 of the data file. Does it match OCEANS data type?

  • Mat

Mat,
You are correct.
The error is occuring at READ(IIN,3) junk,OCEANS
3 FORMAT(a,120)
and OCEANS is a logical parameter and defined as .true.
where .true. represents running on a Beowulf-type machine with local disks.

thanks.
Sandip

Hi Mat,

Thank you for your help.

I again need your help.

I’m facing this error message,
PGFIO/stdio: No such file or directory
PGFIO-F-/OPEN/unit=10/error code returned by host stdio - 2.
File name = /tmp/DATABASES_MPI/proc0005_iboolleft_xi.txt
In source file get_MPI_cutplanes_xi.f90, at line number 69

I’ve created an directory /tmp/DATABASES_MPI on each node.

the error is here,
! global point number and coordinates left MPI cut-plane
open(unit=10,file=prname(1:len_trim(prname))//‘iboolleft_xi.txt’,status=‘unknown’) …line number 69

! create the name for the database of the current slide and region
write(procname,10) iproc
10 format(‘/proc’,i4.4,‘_’)

I’ve mentioned the processors name as,
n1.cluster.edu
n2.cluster.edu
. . . . . . …
in DATA/Par_file
MACHINE_FILE = mymachines.64

I’m using the shell script for compilation

name of the file that contains the list of machines

list_of_machines=grep MACHINE_FILE DATA/Par_file | cut -c 34-

In the original script,
grep -v ‘#’ $list_of_machines |tr -d ’ '| tr -d ‘n’ >$PWD/mymachines.64
this line was also used. But I’ve commented this line.

In the original script, the machine file name was n001, n002, n003…

Does the name of the machine file creat the problem?
I didn’t able to resolve the problem.
Can you help me to find out some solution?

If you need I can send the files.

Thanks.
Sandip

Hi Sandip,

I was able to recreate your error but only if the directory doesn’t exist. Can you double check that “/tmp/DATABASES_MPI” does exist and can be accessed by your processes? Maybe DATABASE_MPI is mis-spelled?

  • Mat

Hi Mat,

Thanks for your suggestion.
I forgot to use chmod command to change the access permission of /tmp/DATABASES_MPI.
Now, It’s reading the file.

Thaking you.
Sandip Chattopadhyay

Hi Mat,

I’ve another question.
I’m getting an error message,

sxc042200@miraclon semv12_32; ./xcheck_buffers_2D
Check all MPI buffers along xi and eta
There are 25 slices numbered from 0 to 24
There are 5 slices along xi
There are 5 slices along eta
reading slice addressing
checking row 0
checking slice ixi = 0 in that row
PGFIO-F-217/list-directed read/unit=48/attempt to read past end of file.
File name = OUTPUT_FILES/filtered_machines.txt formatted, sequential access record = 14
In source file create_serial_name_database.f90, at line number 49

In source file create_serial_name_database.f90, at line number 49
open(unit=48,file=‘OUTPUT_FILES/filtered_machines.txt’,status=‘old’)
do iprocloop = 0,nproc_max_loop
read(48,*) num_active_proc(iprocloop) ----------> line 49

I 've defined the the machines name as ,
n6,n7,n8,…

and use the shell script,
grep -v ‘#’ $list_of_machines |tr -d ’ '| tr -d ‘n’ > OUTPUT_FILES/filtered_machines.txt
BUT, in the original code, the machine name was n001,n002,n003,…

I think the machine names are creating the problem. How I can change the shell script to read my machine names as n6,n7,n8,…?

Can you give me some suggestion ?

Regards.
Sandip