Explicit deallocation in subroutine inside fortran openmp offload kernel crashes

If I do a deallocation of an array inside a subroutine inside a fortran openmp offload kernel I get a crash.

This works fine:

program prototype

  implicit none

  integer :: i, k
  integer :: num_elems
  integer, allocatable, dimension(:) :: testb

  num_elems = 1800

  !$omp target loop
  do i=1,num_elems
        
        allocate(testb(1:5))
    
        do k= 1,5
            testb(k) = k
        end do
    
        deallocate(testb)
  end do
  !$omp end target loop
 
end program prototype

But this crashes:

module some_module

    implicit none
contains

subroutine test_allocs(i)
    !$omp declare target
    implicit none
    integer, intent(in) :: i
    integer :: k

    real, allocatable, dimension(:) :: testb
    allocate(testb(1:5))
        
    do k= 1,5
        testb(k) = k
    end do

    deallocate(testb)

   end subroutine

end module some_module


program prototype

  use some_module

  implicit none

  integer :: i
  integer :: num_elems

  num_elems = 1800

  !$omp target loop
  do i=1,num_elems
        
    call test_allocs(i)
  end do
  !$omp end target loop

end program prototype

If I remove the deallocate it doesn’t crash. Should I never deallocate explicitly and simply rely on implicit deallocation as the variable goes out of scope?

You can use this as a workaround, but I think this is a compiler issue so added a report, TPR #37307, and sent it to engineering for review.

The error is a “double free”. As the routine is exiting, the compiler checks if the array is allocated and if it is, then it deallocates it. But for some reason, the allocation check is returning true, even though it’s been deallocated. Same issue occurs when using “ALLOCATED”, so something is not getting set correctly after deallocation so it shows as still being allocated.

Note that device side allocation is generally discouraged. It’s not illegal and may be necessary in some cases, but our advice is to avoid it if possible. Allocation gets serialized so can negatively impact performance. Also, the device side heap is fairly small so it’s easy to run into heap overflows. You can increase the heap size via the environment variable “NV_ACC_CUDA_HEAPSIZE”, but it’s still best to avoid it.

FYI, TPR #37307 is mostly fixed in our 25.7 release. We did find an issue during testing where we still see failures when using inlining (i.e. -Minline), but so long as you don’t need this flag, you should be ok. We’ll work to address the inlining issue in a future release, but didn’t want to delay getting the primary fix to you.