I got a curious error with a managed derived type data object:
module mpi_comm
!...
type pointer_array_hfdev
real(rp), device, pointer:: r(:, :, :)
end type
type(pointer_array), save, private, allocatable:: ptr_post_calc(:)
type(pointer_array_hfdev), private, allocatable, managed :: ptr_post_calc_hfdev(:)
type(pointer_array), save, private, allocatable:: ptr_pre_refer(:)
type(pointer_array_hfdev), private, allocatable, managed :: ptr_pre_refer_hfdev(:)
!...
contains
!....
subroutine mpi_halo_ini
!....
allocate(ptr_post_calc_hfdev(max_ptr_halo), stat = istat)
if (istat /= 0) then
print *, 'Error during allocation ptr_post_calc_hfdev', istat
stop 3
end if
allocate(ptr_pre_refer_hfdev(max_ptr_halo), stat = istat)
if (istat /= 0) then
print *, 'Error during allocation ptr_pre_refer_hfdev', istat
stop 3
end if
do i = 1, max_ptr_halo
nullify(ptr_post_calc_hfdev(i)%r)
nullify(ptr_pre_refer_hfdev(i)%r)
end do
!...
end subroutine
output (747 being the first nullify line).
0: Null pointer for ptr_post_calc_hfdev (mpi_comm.f90: 747)
I already tried writing a reproducer with no success (i.e. couldn’t trigger the error), but it seems to me that it only happens when there’s lots of other things going on in device memory. max_ptr_halo is set to 200 btw.
I’m at a loss here because the allocate call itself doesn’t produce an error. This is only a wrapper around cudaMallocManaged, right? I tried using derived types without managed but couldn’t get it working. Maybe it gets confused with resident vs. non resident memory for some reason?