Segfault with derived type in OpenMP firstprivate clause

Dear PGI team,

please consider the following Fortran test case:

program p

  type t1
    integer, allocatable :: a(:,:)
  end type

  type t2
    type(t1) :: b
  end type

  type(t2) :: x

  print *, "*"
!$omp parallel firstprivate(x)
  print *, "**"
!$omp end parallel
  print *, "***"
end

After compiling this with “pgfortran -mp” (PGI 17.4) and running the generated program, I see output like this:

 *
 **
 **
Segmentation fault (core dumped)

The number of two-star lines can vary (I have seen 0-3 on a six-core machine). gfortran 4.9+ and ifort 10 handle it well. I think this is a OpenMP 4.0 feature.

Best regards,
Janus

I tried this on several systems but could not get the PGI compiled versions to seg fault on execution. Does this consistently happen everywhere for you, or is it intermittent?

As mentioned, the output changes randomly, but I always see the segfault. I observed it on two different machines with x86_64 architecture (one dual-core and one hexacore), both running Ubuntu.

Are you sure you’re using the -mp flag? It is needed to reproduce the segfault, of course.

Btw I also see it with the flang compiler:

https://github.com/flang-compiler/flang/issues/158

Cheers,
Janus

I went on Ubuntu 1604, and compiled this

gfortran -fopenmp -o test2_gfort test2.f90
ifort -fopenmp -o test2_intel test2.f90
pgfortran -mp -o test2_pgi test2.f90

% cat test2.f90
program p

type t1
integer, allocatable :: a(:,:)
end type

type t2
type(t1) :: b
end type

type(t2) :: x
integer omp_get_num_procs, omp_get_max_threads
integer omp_get_num_threads
!================================================
! Determine the number of physical cores
!===============================================
icores=omp_get_num_procs()
print *,“number of cores =”,icores
!================================================
! Determine the current thread maximum
!===============================================
print *,“max threads =”, omp_get_max_threads()
!================================================
! Determine the current threads assigned
!===============================================
print *,“current num threads =”, omp_get_num_threads()
!================================================
! Set the OpenMP thread counts to Number of cores
!================================================
call omp_set_num_threads(icores)
print , ""
!$omp parallel firstprivate(x)
print *, “"
!$omp end parallel
print , "

end


% gfortran -o test2_gfort -fopenmp test2.f90
% ifort -o test2_intel -fopenmp test2.f90
% pgfortran -o test2_pgi -mp test2.f90
% test2_gfort
number of cores = 12
max threads = 12
current num threads = 1
*
**
**
**
**
**
**
**
**
**
**
**
**


% test2_intel
number of cores = 12
max threads = 12
current num threads = 1
*
**
**
**
**
**
**
**
**
**
**
**
**


% test2_pgi
number of cores = 12
max threads = 1
current num threads = 1
*
**
**
**
**
**
**
**
**
**
**
**
**


Notice that if you do not set OMP_NUM_THREADS to a value before
you run the code, or inside the program, then ifort and gfort automatically set the thread limit to the number of cores
(I believe the CPU has 6 cores, and with hyperthread that makes it 12.), while PGI sets the limit to 1.

Actually your test case sometimes runs through for me, but sometimes I see the segfault:


./a.out 
 number of cores =           12
 max threads =            1
 current num threads =            1
 *
 **
Segmentation fault (core dumped)

(Again, varying numbers of two-star lines.)

That’s also on Ubuntu 16.04, with an Intel Core i7 CPU X 990 (6 cores + hyperthreading).

Seems to be some kind of race condition concerning the initialization of the allocatable components?

Cheers,
Janus


PS: Running it under valgrind shows all kind of funny stuff:

$ valgrind ./a.out 
==21666== Memcheck, a memory error detector
==21666== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==21666== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==21666== Command: ./a.out
==21666== 
 number of cores =           12
 max threads =            1
 current num threads =            1
 *
==21666== Conditional jump or move depends on uninitialised value(s)
==21666==    at 0x522AE4B: pgf90_allocated (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x401751: MAIN_ (test2.f90:32)
==21666==    by 0x401243: main (in /home/janus/fort/pgi_bugs/a.out)
==21666== 
 **
==21666== Thread 2:
==21666== Conditional jump or move depends on uninitialised value(s)
==21666==    at 0x522AE4B: pgf90_allocated (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x401751: MAIN_ (test2.f90:32)
==21666== 
 **
==21666== Thread 5:
==21666== Conditional jump or move depends on uninitialised value(s)
==21666==    at 0x522AE4B: pgf90_allocated (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x401751: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666== 
==21666== Conditional jump or move depends on uninitialised value(s)
==21666==    at 0x522CEC5: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x40176D: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666== 
==21666== Conditional jump or move depends on uninitialised value(s)
==21666==    at 0x522DB40: reuse_alloc (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CCE5: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x40176D: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666== 
==21666== Conditional jump or move depends on uninitialised value(s)
==21666==    at 0x522DC43: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x40176D: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666== 
==21666== Conditional jump or move depends on uninitialised value(s)
==21666==    at 0x522DC4F: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x40176D: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666== 
==21666== Conditional jump or move depends on uninitialised value(s)
==21666==    at 0x522DC83: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x40176D: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666== 
==21666== Use of uninitialised value of size 8
==21666==    at 0x522DD90: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x40176D: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666== 
==21666== Invalid read of size 8
==21666==    at 0x522DD90: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x40176D: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666==  Address 0xfffffffffffffffd is not stack'd, malloc'd or (recently) free'd
==21666== 
==21666== 
==21666== Process terminating with default action of signal 11 (SIGSEGV)
==21666==  Access not within mapped region at address 0xFFFFFFFFFFFFFFFD
==21666==    at 0x522DD90: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666==    by 0x40176D: MAIN_ (test2.f90:32)
==21666==    by 0x3: ???
==21666==    by 0xB: ???
==21666==    by 0x4: ???
==21666==  If you believe this happened as a result of a stack
==21666==  overflow in your program's main thread (unlikely but
==21666==  possible), you can try to increase the size of the
==21666==  main thread stack using the --main-stacksize= flag.
==21666==  The main thread stack size used in this run was 8388608.
==21666== 
==21666== HEAP SUMMARY:
==21666==     in use at exit: 485,328 bytes in 528 blocks
==21666==   total heap usage: 529 allocs, 1 frees, 486,352 bytes allocated
==21666== 
==21666== LEAK SUMMARY:
==21666==    definitely lost: 0 bytes in 0 blocks
==21666==    indirectly lost: 0 bytes in 0 blocks
==21666==      possibly lost: 3,168 bytes in 11 blocks
==21666==    still reachable: 482,160 bytes in 517 blocks
==21666==         suppressed: 0 bytes in 0 blocks
==21666== Rerun with --leak-check=full to see details of leaked memory
==21666== 
==21666== For counts of detected and suppressed errors, rerun with: -v
==21666== Use --track-origins=yes to see where uninitialised values come from
==21666== ERROR SUMMARY: 10 errors from 10 contexts (suppressed: 0 from 0)
Killed

[/code]

On another machine (Ubuntu 17.04, Intel Core i7-4500U CPU), I even run into trouble with this single-threaded variant:

program p

type t1
integer, allocatable :: a(:,:)
end type

type t2
type(t1) :: b
end type

type(t2) :: x
integer omp_get_num_procs

icores=omp_get_num_procs()
print *,"number of cores =",icores
call omp_set_num_threads(1)

print *, "*"
!$omp parallel firstprivate(x)
print *, "**"
!$omp end parallel
print *, "***"
end

Giving the output:

 number of cores =            4
 *
*** Error in `./a.out': free(): invalid pointer: 0x00007f33bd80b400 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7908b)[0x7f33bd4c608b]
/lib/x86_64-linux-gnu/libc.so.6(+0x82c3a)[0x7f33bd4cfc3a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f33bd4d3d2c]
/opt/pgi/linux86-64/17.4/lib/libpgf90.so(__fort_free+0x28)[0x7f33beed13d8]
/opt/pgi/linux86-64/17.4/lib/libpgf90.so(__fort_gfree+0x9)[0x7f33beed1489]
/opt/pgi/linux86-64/17.4/lib/libpgf90.so(+0x1ccd98)[0x7f33beebed98]
/opt/pgi/linux86-64/17.4/lib/libpgf90.so(pgf90_dealloc03+0x73)[0x7f33beebdd03]
/opt/pgi/linux86-64/17.4/lib/libpgf90.so(pgf90_dealloc_mbr03+0x4b)[0x7f33beebdedb]
./a.out[0x401527]
./a.out[0x4010f4]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f33bd46d3f1]
./a.out[0x400fda]
======= Memory map: ========
[...]

Cheers,
Janus

Just so there is no misunderstanding, we do think there is a bug here
and the flang problem does replicate. If the two instances are
the same problem, we will get both fixed.

I just had a problem demonstrating the issue with your code.

dave

Thanks for your efforts, Dave. I hope you can get it fixed soon …

Cheers,
Janus

In order to correct myself here: I think the bug rather concerns the deallocation (not the initialization) of the alloc. component, since the backtrace for the segfault looks like this:

(gdb) bt
#0  0x00007ffff77c6d90 in __fort_dealloc03 () from /opt/pgi/linux86-64/17.4/lib/libpgf90.so
#1  0x00007ffff77c5d03 in pgf90_dealloc03 () from /opt/pgi/linux86-64/17.4/lib/libpgf90.so
#2  0x00007ffff77c5edb in pgf90_dealloc_mbr03 () from /opt/pgi/linux86-64/17.4/lib/libpgf90.so

Cheers,
Janus

I’ll note that the bug has been fixed in the flang codebase by now, see:

https://github.com/flang-compiler/flang/commit/e3200a11bda5dde33d150c4c0e902a658a76bd20

I assume it should be easy to ‘backport’ this to the PGI codebase (if it hasn’t been fixed there already).

Can anyone tell me if the bug has been fixed in the recent release 17.7 by any chance?

Cheers,
Janus

I tried your example with 17.7, and i got things to abort right away.

I have logged the issue as TPR 24649, and I have added your note about this already addressed in FLANG.

thanks,
dave

This problem should no longer occur with the 17.9 release.

dave

Thank you very much for fixing it!

Cheers,
Janus