Hi,
I am using the pghpf compiler to compile my programs.
I am working on a cluster with a pbs queuing system.
I didn’t have any problems with version 5.x of the compiler.
We recently upgraded to version 6 and that’s where my
problem begun.
My executable is submitted to the queue with following script:
#!/bin/tcsh
#PBS -q q2w1n
#PBS -j oe -k oe
#PBS -l nodes=1:ppn=2
#PBS -v BeginCFG,EndCFG
#cd /home/fgcao/runVacuumResp
cd /home/fgcao/fbissey/SandBox
# Set run parameters
set ncpus=2
setenv LD_LIBRARY_PATH /opt/pgi601/linux86/6.0/lib
###########################################################
# Script Name must be 15 characters or less
# To run:
# qsub -v BeginCFG=001,EndCFG=001 lyplan2.csh
#
###########################################################
# Deal with file names
#
set exeFlags = "-n $ncpus"
set beta = "b460"
set size = "s16t32"
set imp = "IMP"
set basedir = "/home/fgcao/Configurations/"$size"/"
set dir = "su3"$beta$size$imp
set baseConfig = $dir"c"
set yorn = ".true."
set smear3d = 30
set prefix = "./results/"
set exeName = "./VacuumRespLYplan"$size
set thisReport = "RunStatusLYplan"$size"-"$ncpus"c"$BeginCFG"-"$EndCFG
echo `date`
pwd
# Run the parallel program
echo "mpiexec -verbose $exeFlags $exeName -pghpf -np $ncpus > $thisReport"
mpiexec -verbose $exeFlags $exeName -pghpf -np $ncpus > $thisReport << ....END
$basedir
$baseConfig
$BeginCFG
$EndCFG
$prefix
3 three-loop improved fMuNu
$smear3d
1 1: action and topological charge, 2: electric and magnetic fields
$yorn
....END
And is submitted using qsub. Program compiled with pghpf
version 5 work fine. With version 6 it doesn’t
I get the following message in one case:
mpiexec -verbose -n 2 ./VacuumRespLYplans16t32 -pghpf -np 2 > RunStatusLYplans16t32-2c192-192
0 - MPI_SEND : Invalid rank 1
[0] Aborting program !
[0] Aborting program!
0 - MPI_SEND : Invalid rank 1
[0] Aborting program !
[0] Aborting program!
mpiexec: Warning: tasks 0-1 exited with status 1.
and if I remove “-pghpf -np 2” from the script it becomes:
PGFIO-F-217/formatted read/unit=5/attempt to read past end of file.
File name = stdin formatted, sequential access record = 1
In source file VacuumRespLY_plan.f, at line number 119
[0] MPI Abort by user Aborting program !
[0] Aborting program!
In this case the program cannot read its input. I also have this
last behavior on a amd64 cluster without removing the
“-pghpf -np 2” argument.
Running the program interactively or changing the script to
execute outside of the queue (and on one processor) works.
Only when I try to run it with mpiexec in the queuing system
do I have problems.
What has changed to cause this behavior? And what can I do
apart from hardwiring my input?