Strange Segmentation fault

Hi all,

I have a strange error and I am starting to doubt wether it could not be related to a compiler issue… I run a rather complex code (weather model) and have gotten a segmentation fault uppon calling of a routine which is quite some time into the exectution of the code. The routine is calle organize_output… I’ve tried to reduce it as much as possible to still get an error and now it looks like this…

SUBROUTINE organize_output

REAL (KIND=irealgrib) :: zprocarray_grib(ie_max,je_max,num_compute) 
REAL (KIND=ireals) :: zvarlev(ie,je,0:MAX(ke+1,nlevels)), &
 zprocarray_real(ie_max,je_max,num_compute), slev(0:MAX(ke+1,nlevels))
REAL (KIND=ireals) :: zenith_t (ie,je), zenith_w (ie,je), zenith_h (ie,je), &
 zcape_mu (ie,je), zcin_mu (ie,je), zcape_ml (ie,je), zcin_ml (ie,je), &
 zcape_3km(ie,je), zlcl_ml (ie,je), zlfc_ml (ie,je), zbrn (ie,je,ke)

  print *,'*** beginning of subroutine organize_output'
  print *,zbrn(1,1,1)
  print *,'gugu'
  print *,zprocarray_grib(1,1,1)
  print *,zvarlev(1,1,0)
  print *,zprocarray_real(1,1,1)
  print *,slev(0)
  print *,zenith_t(1,1)
  print *,zenith_w(1,1)
  print *,zenith_h(1,1)
  print *,zcape_mu(1,1)
  print *,zcin_mu(1,1)
  print *,zcape_ml(1,1)
  print *,zcin_ml(1,1)
  print *,zcape_3km(1,1)
  print *,zlcl_ml(1,1)
  print *,zlfc_ml(1,1)
  print *,'*** end of subroutine organize_output'

END SUBROUTINE organize_output

Upon execution the output is as follows…

 *** before_call_to_organize_output
 num_compute=            1
 nlevels=           40
 ie,je=           41           51
 ie_max,je_max=           41           51
 nzmxid=          130
 *** calling
 *** beginning of subroutine organize_output
Segmentation fault (core dumped)

Sometimes (depending on the details of the lines still remaining in the subroutine) the error message is also…

 *** before_call_to_organize_output
 num_compute=            1
 nlevels=           40
 ie,je=           41           51
 ie_max,je_max=           41           51
 nzmxid=          130
 *** calling
0: ALLOCATE: 18446744071899487520 bytes requested; not enough memory

Upon access to the zbrn array, the code segfaults. I’ve tried “unlimit; setenv MPSTKZ 40000000” with no effect. The code is VERY sensitive to any changes in what remains in the routine… If I remove one line (either in the declarations or the print statements) the behaviour can change to run smoothly without any error.

My compilation options are…

pgf90 -c -I. -I/nfs/xt3-homes/users/olifu/src/lm_4.7_dwd/src -I/opt/xt-mpt/default/mpich2-64/P2/include -I/apps/netcdf/linux/include -Mfree -Mpreprocess -Kieee -Mbyteswapio -O0 -C -g -gopt -Mbounds -Mchkfpstk -Ktrap=fp -o src_output.o /nfs/xt3-homes/users/olifu/src/lm_4.7_dwd/src/src_output.f90

My machine is a Cray XT-4 and I am running on the service nodes for debugging purposes…

uname -a
Linux buin2 2.6.5-7.283-ss #4 SMP Fri Sep 28 13:24:48 PDT 2007 x86_64 x86_64 x86_64 GNU/Linux

The version of pgf90 I use is…

pgf90 -V
pgf90 7.2-4 64-bit target on x86-64 Linux -tp k8-64e

Can anyone give me an idea to what might cause this type of behaviour?

I would be very grateful for any suggestions,
Oliver

Hi Oliver,

This does seem more likely to be a compiler issue related to automatic array allocation but I’m not sure. We did have an issue with ECHAM (TPR#15414) where the size of an automatic array was being calculated after it was allocated, but this involved passing in an array and then using it’s size (via the “SIZE” intrinsic) in the declaration of a second automatic array. Though, your issue is different enough that I’m not positive they are related.

TPR#15414 was reported in the 7.2-5 compiler but may have also been present in 7.2-4. It was fixed in the 8.0-2 release so you may want to try the latest release to see if it fixes the problem. If not, please send a report to PGI Customer Service at trs@pgroup.com. Will most likely need the full code, or a example which illustrates the problem.

Thanks,
Mat