Another bug to report. This one is actually relatively complex, and other compilers have had issues with this code as well, so it isn’t a surprise that there is an issue with nvfortran.
Basically, it is like a parallel STL vector class for OpenMP that allows each thread to append to a thread-local array and then at the end of the parallel region they are combined into a single output array.
Specifically, what appears to be happening, is the “t%sizes” array is erroneously de-allocated before it should be, leading to a seg-fault at line localSize = t%sizes(1, tid+1) - t%sizes(1, tid). Both ifort and gfortran run this code correctly. I was able to work around the issue by moving the allocate(outArr) directly into the openMP region and splitting the finalize command into two separate parts.
I don’t think the problem is due to a deallocation of “t%size” but rather some issue with the allocation of “outArr”.
Here’s some modifications I made to the code (see attached). Adding a print statement which includes accesses to “t%sizes” before and after the allocation, at “-O0” the code segvs in the first access of “t%sizes”. However at -O2, the segv is delayed until the first access of outArr. This is an indication that the stack may be getting corrupted, but I’ll need engineering to investigate. Filed at TPR #31662.
Like you, I found hoisting the allocation of outArr outside of the subroutine works around the error.