program p
type t1
integer, allocatable :: a(:,:)
end type
type t2
type(t1) :: b
end type
type(t2) :: x
print *, "*"
!$omp parallel firstprivate(x)
print *, "**"
!$omp end parallel
print *, "***"
end
After compiling this with “pgfortran -mp” (PGI 17.4) and running the generated program, I see output like this:
*
**
**
Segmentation fault (core dumped)
The number of two-star lines can vary (I have seen 0-3 on a six-core machine). gfortran 4.9+ and ifort 10 handle it well. I think this is a OpenMP 4.0 feature.
I tried this on several systems but could not get the PGI compiled versions to seg fault on execution. Does this consistently happen everywhere for you, or is it intermittent?
As mentioned, the output changes randomly, but I always see the segfault. I observed it on two different machines with x86_64 architecture (one dual-core and one hexacore), both running Ubuntu.
Are you sure you’re using the -mp flag? It is needed to reproduce the segfault, of course.
type(t2) :: x
integer omp_get_num_procs, omp_get_max_threads
integer omp_get_num_threads
!================================================
! Determine the number of physical cores
!===============================================
icores=omp_get_num_procs()
print *,“number of cores =”,icores
!================================================
! Determine the current thread maximum
!===============================================
print *,“max threads =”, omp_get_max_threads()
!================================================
! Determine the current threads assigned
!===============================================
print *,“current num threads =”, omp_get_num_threads()
!================================================
! Set the OpenMP thread counts to Number of cores
!================================================
call omp_set_num_threads(icores)
print , ""
!$omp parallel firstprivate(x)
print *, “"
!$omp end parallel
print , "”
end
% gfortran -o test2_gfort -fopenmp test2.f90
% ifort -o test2_intel -fopenmp test2.f90
% pgfortran -o test2_pgi -mp test2.f90
% test2_gfort
number of cores = 12
max threads = 12
current num threads = 1
*
**
**
**
**
**
**
**
**
**
**
**
**
% test2_intel
number of cores = 12
max threads = 12
current num threads = 1
*
**
**
**
**
**
**
**
**
**
**
**
**
% test2_pgi
number of cores = 12
max threads = 1
current num threads = 1
*
**
**
**
**
**
**
**
**
**
**
**
**
Notice that if you do not set OMP_NUM_THREADS to a value before
you run the code, or inside the program, then ifort and gfort automatically set the thread limit to the number of cores
(I believe the CPU has 6 cores, and with hyperthread that makes it 12.), while PGI sets the limit to 1.
Actually your test case sometimes runs through for me, but sometimes I see the segfault:
./a.out
number of cores = 12
max threads = 1
current num threads = 1
*
**
Segmentation fault (core dumped)
(Again, varying numbers of two-star lines.)
That’s also on Ubuntu 16.04, with an Intel Core i7 CPU X 990 (6 cores + hyperthreading).
Seems to be some kind of race condition concerning the initialization of the allocatable components?
Cheers,
Janus
PS: Running it under valgrind shows all kind of funny stuff:
$ valgrind ./a.out
==21666== Memcheck, a memory error detector
==21666== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==21666== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==21666== Command: ./a.out
==21666==
number of cores = 12
max threads = 1
current num threads = 1
*
==21666== Conditional jump or move depends on uninitialised value(s)
==21666== at 0x522AE4B: pgf90_allocated (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x401751: MAIN_ (test2.f90:32)
==21666== by 0x401243: main (in /home/janus/fort/pgi_bugs/a.out)
==21666==
**
==21666== Thread 2:
==21666== Conditional jump or move depends on uninitialised value(s)
==21666== at 0x522AE4B: pgf90_allocated (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x401751: MAIN_ (test2.f90:32)
==21666==
**
==21666== Thread 5:
==21666== Conditional jump or move depends on uninitialised value(s)
==21666== at 0x522AE4B: pgf90_allocated (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x401751: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666==
==21666== Conditional jump or move depends on uninitialised value(s)
==21666== at 0x522CEC5: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x40176D: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666==
==21666== Conditional jump or move depends on uninitialised value(s)
==21666== at 0x522DB40: reuse_alloc (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CCE5: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x40176D: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666==
==21666== Conditional jump or move depends on uninitialised value(s)
==21666== at 0x522DC43: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x40176D: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666==
==21666== Conditional jump or move depends on uninitialised value(s)
==21666== at 0x522DC4F: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x40176D: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666==
==21666== Conditional jump or move depends on uninitialised value(s)
==21666== at 0x522DC83: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x40176D: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666==
==21666== Use of uninitialised value of size 8
==21666== at 0x522DD90: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x40176D: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666==
==21666== Invalid read of size 8
==21666== at 0x522DD90: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x40176D: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666== Address 0xfffffffffffffffd is not stack'd, malloc'd or (recently) free'd
==21666==
==21666==
==21666== Process terminating with default action of signal 11 (SIGSEGV)
==21666== Access not within mapped region at address 0xFFFFFFFFFFFFFFFD
==21666== at 0x522DD90: __fort_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CD02: pgf90_dealloc03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x522CEDA: pgf90_dealloc_mbr03 (in /opt/pgi/linux86-64/17.4/lib/libpgf90.so)
==21666== by 0x40176D: MAIN_ (test2.f90:32)
==21666== by 0x3: ???
==21666== by 0xB: ???
==21666== by 0x4: ???
==21666== If you believe this happened as a result of a stack
==21666== overflow in your program's main thread (unlikely but
==21666== possible), you can try to increase the size of the
==21666== main thread stack using the --main-stacksize= flag.
==21666== The main thread stack size used in this run was 8388608.
==21666==
==21666== HEAP SUMMARY:
==21666== in use at exit: 485,328 bytes in 528 blocks
==21666== total heap usage: 529 allocs, 1 frees, 486,352 bytes allocated
==21666==
==21666== LEAK SUMMARY:
==21666== definitely lost: 0 bytes in 0 blocks
==21666== indirectly lost: 0 bytes in 0 blocks
==21666== possibly lost: 3,168 bytes in 11 blocks
==21666== still reachable: 482,160 bytes in 517 blocks
==21666== suppressed: 0 bytes in 0 blocks
==21666== Rerun with --leak-check=full to see details of leaked memory
==21666==
==21666== For counts of detected and suppressed errors, rerun with: -v
==21666== Use --track-origins=yes to see where uninitialised values come from
==21666== ERROR SUMMARY: 10 errors from 10 contexts (suppressed: 0 from 0)
Killed
On another machine (Ubuntu 17.04, Intel Core i7-4500U CPU), I even run into trouble with this single-threaded variant:
program p
type t1
integer, allocatable :: a(:,:)
end type
type t2
type(t1) :: b
end type
type(t2) :: x
integer omp_get_num_procs
icores=omp_get_num_procs()
print *,"number of cores =",icores
call omp_set_num_threads(1)
print *, "*"
!$omp parallel firstprivate(x)
print *, "**"
!$omp end parallel
print *, "***"
end
Just so there is no misunderstanding, we do think there is a bug here
and the flang problem does replicate. If the two instances are
the same problem, we will get both fixed.
I just had a problem demonstrating the issue with your code.
In order to correct myself here: I think the bug rather concerns the deallocation (not the initialization) of the alloc. component, since the backtrace for the segfault looks like this:
(gdb) bt
#0 0x00007ffff77c6d90 in __fort_dealloc03 () from /opt/pgi/linux86-64/17.4/lib/libpgf90.so
#1 0x00007ffff77c5d03 in pgf90_dealloc03 () from /opt/pgi/linux86-64/17.4/lib/libpgf90.so
#2 0x00007ffff77c5edb in pgf90_dealloc_mbr03 () from /opt/pgi/linux86-64/17.4/lib/libpgf90.so