Another bug to report. This one is actually relatively complex, and other compilers have had issues with this code as well, so it isn’t a surprise that there is an issue with nvfortran.
Basically, it is like a parallel STL vector class for OpenMP that allows each thread to append to a thread-local array and then at the end of the parallel region they are combined into a single output array.
Specifically, what appears to be happening, is the “t%sizes” array is erroneously de-allocated before it should be, leading to a seg-fault at line
localSize = t%sizes(1, tid+1) - t%sizes(1, tid). Both ifort and gfortran run this code correctly. I was able to work around the issue by moving the
allocate(outArr) directly into the openMP region and splitting the finalize command into two separate parts.
codeMod.F90 (4.3 KB)
main.F90 (75 Bytes)
Makefile (241 Bytes)
I don’t think the problem is due to a deallocation of “t%size” but rather some issue with the allocation of “outArr”.
Here’s some modifications I made to the code (see attached). Adding a print statement which includes accesses to “t%sizes” before and after the allocation, at “-O0” the code segvs in the first access of “t%sizes”. However at -O2, the segv is delayed until the first access of outArr. This is an indication that the stack may be getting corrupted, but I’ll need engineering to investigate. Filed at TPR #31662.
Like you, I found hoisting the allocation of outArr outside of the subroutine works around the error.
codeMod.F90 (4.6 KB)
% nvfortran -mp codeMod.F90 main.F90 -O0 ; a.out
t1: 0 1000000
% nvfortran -mp codeMod.F90 main.F90 -O1 ; a.out
t1: 0 1000000
% nvfortran -mp codeMod.F90 main.F90 -O1 -DWORKS; a.out
size of outArr: 1000000