I’m using pgilinux.904 on OpenSuse 11.1 and performing pgf90 on one script. After fortran stops in the middle of the script I get response Segmentation fault. Script is tested on other linux versions and pgi workstations and works correct. Could my problem be caused by some execution time limit?
If it’s a signal 9 (kill), then yes you could have a execution time limit set on your system that would cause a time out. Though, a signal 9 could also mean that OS killed your process due to other reasons such as running the system running out of memory.
If it is a seg fault (signal 11), then the signal would be coming from process you’re running, and not a time out.
A segmentation violation generally occurs when your program attempts to access memory it does not have permission to access. Common causes are de-referencing a null pointer, writing past the end of an array (out-of-bounds errors), or using un-initialized pointer variables.
As to the specific reason why your program is seg faulting, the only way to tell is to use a debugger like PGDBG or gdb. (See: http://www.pgroup.com/doc/pgitools.pdf for more information about PGDBG).
While learning how to debugging can take time (but well worth the time!), the simplest thing to do is run:
pgdbg -text ../exe/initbc.exe
Next type ‘run’. Once the program seg faults, type ‘where’ to get a stack trace. Though, debugging optimized can be very difficult so you may want to recompile with “-g” (debug) only.
Another useful tool that I highly recommend is Valgrind (see: http://www.valgrind.org). It’s very good at finding un-initialized memory.
nebojsa@linux-sckl:~/worketa_all/eta/bin> pgdbg -text …/exe/initbc.exe
/opt/pgi/linux86/9.0/bin//pgdbg1 -text …/exe/initbc.exe
NOTE: your trial license will expire in 2 days, 10.5 hours.
PGDBG 9.0-4 x86 (Cluster, 8 Process)
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2009, STMicroelectronics, Inc. All Rights Reserved.
Loaded: /home/nebojsa/worketa_all/eta/exe/initbc.exe
NOTE: Can’t find main function compiled -g
pgdbg>
After run it started to calculate but every time it stops on the same place:
…
AT: LAT,LON 135.2281 36.23340
SNOWFREE ALBEDO SET TO DEFAULT VALUE OF 20%
MAX SNOW ALBEDO SET TO DEFAULT VALUE OF 55%
Signalled SIGSEGV at 0x8091FF9, function gridst_
0x8091FF9: 89 AA 24 B1 FF FF movl %ebp,-20188(%edx)
It didn’t work. I’ve tried -stv unlimited.
Well, here are flags in make.inc:
FC = pgf90
MPI_FC = mpif90
CC = /lib/cpp -traditional-cpp
FFLAGS = -fast -DLITTLE
FFLAGS = -lfpe
CFLAGS =
in make.linux:
FC = f90
CC = /lib/cpp
FFLAGS = -O -s -DLITTLE
CFLAGS = -O
in make.hups
FC = f90
CC = /lib/cpp -P
FFLAGS = -O2 +Odataprefetch +U77
CFLAGS = -O
in make.inc.DEC
FC = f90
CC = /lib/cpp -P
FFLAGS = -O4 -DLITTLE -DDEC
CFLAGS =
and in make.inc.linux.ncep
FC = /export-1/sgi100/data/mpyle/f90_wrk
CC = /lib/cpp -P
FFLAGS = -O -s -DLITTLE
CFLAGS =
The assembly is saving the stack pointer into memory. This occurs when entering a function. The large negative constant indicates that you have a lot of local variables.
When I see a seg fault here it’s almost always due to the program running out of stack space. Even with ‘ulimited’ stack space, the OS does have a hard limit. You may be hitting this hard limit.
You could have bug in your program that causes infinite recursion. Though, this is less likely since it worked on a different system. Unless the code is taking a different code path. Was the system that worked 64-bits? If this is the case, then your program may not be able to run in 32-bits without modification.
I’m just guessing though. You’ll need to debug your program to determine what’s really going on.