Nvfortran async malloc failed (segmentation fault) for module variables

Here is my minimum program

module arrs
  implicit none
  integer(4),allocatable, dimension(:), device           :: module_arr
end module

  program test
    use arrs
    use cudafor
    implicit none
    integer(4), allocatable, dimension(:), device           :: arr
    integer, stat
    integer(kind=cuda_stream_kind) :: istream
    stat = cudaStreamCreate(istream)
    allocate(module_arr(10), stream=istream)
    allocate(arr(10), stream=istream)
  end program

After compiling with nvfortran -Mcuda -o temp test.f90, and running with /.temp, I got Segmentation fault (core dumped). It is due to the asynchronous allocation to the module_arr array since after I commenting out the line with allocate(module_arr(10), stream=istream) and only async allocate the array defined inside the program, it runs normally without any problem.

And the regular synchronous allocation works normally for both arrays.

It is a bug or something that is expected? Why it happens? Thanks a lot for any help. I am using nvfortran 22.3-0 and cuda 11.6

Probably a bug. I’ve filed a problem report, TPR #31952, and sent to engineering for investigation.