Seg-fault during profiling of MPi+OpenMP application

Hi all,
this is my first time on this forum. I have a problem with my application when I enable profiling and multithreading options. My application uses both MPI and OpenMP and it’s written in FORTRAN90. I have followed all the instructions and expedients reported on the documentation but the segmentation fault continue to appear on each execution

I use LSF batch system to submit my application…

export OMP_NUM_THREADS=2
export OMP_STACK_SIZE=256M
mpirun -np 4 […] ./myapp.x

The error is similar to the following
[node0009:11998] *** Process received signal ***
[node0009:11998] Signal: Segmentation fault (11)
[node0009:11998] Signal code: Address not mapped (1)
[node0009:11998] Failing at address: 0xffffffffffffff18
[node0009:11998] *** End of error message ***

$ ldd myapp.x
libmpi_f90.so.0 => /opt/openmpi/1.2.8/pgi–8.0-2–binary/lib/libmpi_f90.so.0 (0x00002aaaaacc6000)
libmpi_f77.so.0 => /opt/openmpi/1.2.8/pgi–8.0-2–binary/lib/libmpi_f77.so.0 (0x00002aaaaaf22000)
libmpi.so.0 => /opt/openmpi/1.2.8/pgi–8.0-2–binary/lib/libmpi.so.0 (0x00002aaaab152000)
libopen-rte.so.0 => /opt/openmpi/1.2.8/pgi–8.0-2–binary/lib/libopen-rte.so.0 (0x00002aaaab4a1000)
libopen-pal.so.0 => /copt/openmpi/1.2.8/pgi–8.0-2–binary/lib/libopen-pal.so.0 (0x00002aaaab76f000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00002aaaab9eb000)
librt.so.1 => /lib64/librt.so.1 (0x00002aaaabbf7000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaabe00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x00002aaaac004000)
libutil.so.1 => /lib64/libutil.so.1 (0x00002aaaac21d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaac420000)
libpgbind.so => /popt/pgi/linux86-64/8.0-2/lib/libpgbind.so (0x00002aaaac63a000)
libnuma.so => /opt/pgi/linux86-64/8.0-2/lib/libnuma.so (0x00002aaaac73c000)
libm.so.6 => /lib64/libm.so.6 (0x00002aaaac83d000)
libc.so.6 => /lib64/libc.so.6 (0x00002aaaacac0000)
libpgf90.so => /opt/pgi/linux86-64/8.0-2/libso/libpgf90.so (0x00002aaaace11000)
libpgf90_rpm1.so => /opt/pgi/linux86-64/8.0-2/libso/libpgf90_rpm1.so (0x00002aaaad1cc000)
libpgf902.so => /opt/pgi/linux86-64/8.0-2/libso/libpgf902.so (0x00002aaaad2ce000)
libpgf90rtl.so => /opt/pgi/linux86-64/8.0-2/libso/libpgf90rtl.so (0x00002aaaad3e1000)
libpgftnrtl.so => /opt/pgi/linux86-64/8.0-2/libso/libpgftnrtl.so (0x00002aaaad504000)
libpgc.so => /opt/pgi/linux86-64/8.0-2/libso/libpgc.so (0x00002aaaad632000)
/lib64/ld-linux-x86-64.so.2 (0x00002aaaaaaab000)

Compile/linker flags are:
MPIF90 = mpif90
CC = pgcc
F77 = pgf77
CFLAGS = -mp -O1 -W0,-profile,lines -Mprof=lines
F90FLAGS = -mp -O1 -r8 -W0,-profile,lines -Mprof=lines
LD = mpif90 -mp -lpgnod_prof_openmpi -W0,-profile,lines -Mprof=lines

If I remove “-mp” flag, the application works without faults!
I’m using the latest available version of PGI compiler (8.0-2).

How I can resolve this problem?

Thank you very much in advance!

Hi filippo.spiga,

The most likely cause is a stack overflow. I see that you’ve tried increasing the stack size, but accidently misspelled the environment variable. The OpenMP 3.0 variable is “OMP_STACKSIZE”. If removing the second underscore doesn’t work, try increaseing the size to 512M.

If it still seg faults, then try running your code in the PGI debugger, PGDBG, to get a better understanding of the error. You’ll need to use the MPI libraries that accompany your compilers. If you have the PGI CDK product you’ll be able to debug your program using either MPI or MPI-2 on your cluster. Otherwise, you’ll need use MPI-1 (mpich) and will be limited to debugging on a single node.

  • Mat