script execution time limit and segmentation fault

I’m using pgilinux.904 on OpenSuse 11.1 and performing pgf90 on one script. After fortran stops in the middle of the script I get response Segmentation fault. Script is tested on other linux versions and pgi workstations and works correct. Could my problem be caused by some execution time limit?

Thanks

Nebojsa

Hi Nebojsa,

What signal are you getting?

If it’s a signal 9 (kill), then yes you could have a execution time limit set on your system that would cause a time out. Though, a signal 9 could also mean that OS killed your process due to other reasons such as running the system running out of memory.

If it is a seg fault (signal 11), then the signal would be coming from process you’re running, and not a time out.

  • Mat

Hi Mat,

Here’s the output when this script stops to work:

./new_prep.sh: line 92: 7514 Segmentation fault …/exe/initbc.exe > initbc.out
value of err is 139
BAILING OUT BECAUSE SOMETHING FAILED!!!

Hi Nebojsa,

A segmentation violation generally occurs when your program attempts to access memory it does not have permission to access. Common causes are de-referencing a null pointer, writing past the end of an array (out-of-bounds errors), or using un-initialized pointer variables.

As to the specific reason why your program is seg faulting, the only way to tell is to use a debugger like PGDBG or gdb. (See: http://www.pgroup.com/doc/pgitools.pdf for more information about PGDBG).

While learning how to debugging can take time (but well worth the time!), the simplest thing to do is run:

pgdbg -text  ../exe/initbc.exe

Next type ‘run’. Once the program seg faults, type ‘where’ to get a stack trace. Though, debugging optimized can be very difficult so you may want to recompile with “-g” (debug) only.

Another useful tool that I highly recommend is Valgrind (see: http://www.valgrind.org). It’s very good at finding un-initialized memory.

Hope this helps,
Mat

Thanks Mat, I’ll try what you suggested and inform you about the results.

Best regards!

Nebojsa

Hi Mat,

Here’s the oputput:

nebojsa@linux-sckl:~/worketa_all/eta/bin> pgdbg -text …/exe/initbc.exe
/opt/pgi/linux86/9.0/bin//pgdbg1 -text …/exe/initbc.exe
NOTE: your trial license will expire in 2 days, 10.5 hours.
PGDBG 9.0-4 x86 (Cluster, 8 Process)
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2009, STMicroelectronics, Inc. All Rights Reserved.
Loaded: /home/nebojsa/worketa_all/eta/exe/initbc.exe
NOTE: Can’t find main function compiled -g

pgdbg>

After run it started to calculate but every time it stops on the same place:


AT: LAT,LON 135.2281 36.23340
SNOWFREE ALBEDO SET TO DEFAULT VALUE OF 20%
MAX SNOW ALBEDO SET TO DEFAULT VALUE OF 55%

Signalled SIGSEGV at 0x8091FF9, function gridst_
0x8091FF9: 89 AA 24 B1 FF FF movl %ebp,-20188(%edx)

pgdbg>

Do you know what that means?

Thanks and regards!

Nebojsa

Hi Nebojsa,

Looks like it might be a stack overflow. Try setting your environment’s stack size to unlimited.

In csh type “unllimit” in your shell and then re-run. In bash, the command is “ulimit -s unlimited”.

Hope this helps,
Mat

Hi Mat,

It didn’t work. I’ve tried -stv unlimited.
Well, here are flags in make.inc:
FC = pgf90
MPI_FC = mpif90
CC = /lib/cpp -traditional-cpp
FFLAGS = -fast -DLITTLE

FFLAGS = -lfpe

CFLAGS =

in make.linux:
FC = f90
CC = /lib/cpp
FFLAGS = -O -s -DLITTLE
CFLAGS = -O

in make.hups
FC = f90
CC = /lib/cpp -P
FFLAGS = -O2 +Odataprefetch +U77
CFLAGS = -O

in make.inc.DEC
FC = f90
CC = /lib/cpp -P
FFLAGS = -O4 -DLITTLE -DDEC
CFLAGS =

and in make.inc.linux.ncep
FC = /export-1/sgi100/data/mpyle/f90_wrk
CC = /lib/cpp -P
FFLAGS = -O -s -DLITTLE
CFLAGS =

perhaps it could help.

thanks and regards!

Nebojsa

Hi Nebojsa,

The assembly is saving the stack pointer into memory. This occurs when entering a function. The large negative constant indicates that you have a lot of local variables.

When I see a seg fault here it’s almost always due to the program running out of stack space. Even with ‘ulimited’ stack space, the OS does have a hard limit. You may be hitting this hard limit.

You could have bug in your program that causes infinite recursion. Though, this is less likely since it worked on a different system. Unless the code is taking a different code path. Was the system that worked 64-bits? If this is the case, then your program may not be able to run in 32-bits without modification.

I’m just guessing though. You’ll need to debug your program to determine what’s really going on.

  • Mat