Dear Portland Group
I have been using the Portland Group Compiler
(pgf90) for quite some time now. Recently we decided to upgrade the compiler
from version 4.0-2 to the newer version 5.2-4, in order to compile and
execute an atmospheric model (Eta/NCEP model). The model is fully
parallelized and runs using MPICH-1.2.0.
However there is a major problem. When I compile the model using the 5.2
compiler (no errors during compilation) the execution begins normally, but
when the time comes to write the first output file the following error is
shown (bolded letters) and the program crashes:
.
.
.
CALL MPI_ISEND… 4130350 0
CALL MPI_ISEND… 4069612 0
CALL MPI_ISEND… 4108003 0
leaving CHKOUT!!!
leaving CHKOUT!!!
leaving CHKOUT!!!
EBU: TIMESTEP NTSD= 41 FCST TIME= 3600.0
leaving CHKOUT!!!
num_procs = 2
task id, jsta, end = 0 0 1
task id, jsta, end = 1 2 3
ME, MY_ISD,MY_IED,MY_JSD,MY_JED = 0 1 135
1 109
jsta_i,jend_i,jsta_im,jend_im,jsta_im2,jend_im2= 1 107
2 107 3 107
ihour in quilt = 0
Writing in …/…/output/RUN/v_out.000.dat
Writing in …/…/output/RUN/v_out.000.init
Writing in …/…/output/RUN/v_out.000.datFIELDS
Writing in …/…/output/RUN/v_out.000.ground
Writing in …/…/output/RUN/v_out.000.Ddat
ihour in quilt = 0
p4_11721: p4_error: interrupt SIGSEGV: 11
rm_l_4_11738: p4_error: interrupt SIGINT: 2
num_procs = 2
ME, MY_ISD,MY_IED,MY_JSD,MY_JED = 1 1 135
106 213
jsta_i,jend_i,jsta_im,jend_im,jsta_im2,jend_im2= 108 213
108 212 108 211
p5_11741: p4_error: Found a dead connection while looking for messages: 4
rm_l_5_11758: p4_error: interrupt SIGINT: 2
p1_11661: p4_error: Found a dead connection while looking for messages: 4
MYPE in calculation of max: 2
P: 2 Size: 1Dust Load(gr/m2)= 0.048253 at 62 2
P: 2 Size: 2Dust Load(gr/m2)= 0.121718 at 62 2
P: 2 Size: 3Dust Load(gr/m2)= 0.066811 at 62 2
P: 2 Size: 4Dust Load(gr/m2)= 0.004337 at 62 1
P: 2 Total Dust Load(gr/m2)= 0.240015 at 62 2
P: 2 Size: 1Dust Dep(mgr/m2)= 3.00 at 61 5
P: 2 Size: 2Dust Dep(mgr/m2)= 15.77 at 55 4
p2_11681: p4_error: Found a dead connection while looking for messages: 1
rm_l_2_11698: p4_error: interrupt SIGINT: 2
rm_l_1_11678: p4_error: interrupt SIGINT: 2
2.68 at 65 8
P: 3 Size: 3Dust Dep(mgr/m2)= 7.99 at 65 8
P: 3 Size: 4Dust Dep(mgr/m2)= 8.04 at 64 16
P: 3 Total Dust dep(mgr/m2)= 24.64 at 65 8
TSHLTR initially: 295.2062
TSHLTR becoming: 293.3899
p3_11701: p4_error: Found a dead connection while looking for messages: 1
rm_l_3_11718: p4_error: interrupt SIGINT: 2
P4 procgroup file is /mnt/space17/cspir/fine/worketa_all/eta/runs/machines.
bm_list_11658: p4_error: net_recv read: probable EOF on socket: 1
2 Size: 3Dust Dep(mgr/m2)= 12.23 at 55 4
P: 2 Size: 4Dust Dep(mgr/m2)= 3.07 at 59 1
P: 2 Total Dust dep(mgr/m2)= 30.00 at 55 4
TSHLTR initially: 284.2811
TSHLTR becoming: 284.3562
.
.
.
.
The model is run on 4 processors with the following specs (all the same):
CPU: Dual Xeon 3.2MHz (64-bit)
Memory: 2GB
Linux Distribution : Fedora Core 4 - Kernel 2.6.16 - 32-bit
I am also sending you the options we use:
LIBS = -L/usr/local/mpich-1.2.0/lib -lmpich -lfmpich -lmpichf90
FFLAGS = -fast -DLITTLE -lc -lgcc_eh -Wl,-static
I have tried a number of different options (including no options at all) and
the same thing happens. However when I compile the model using the 4.0
compiler (with the same options), everything works fine and the program is
executed with no errors!
I also tried “ulimit -s unlimited” and still the same.
I also tried using the latest version of MPI (1.2.7p1) and still crashes.
Can you please help me? Is there an option or something i can use to fix this?
Thank you for your time,
Christos