integer :: npoints
integer, allocatable, dimension(:) :: field_ind, field_displ
integer :: n
real, allocatable :: buff(:)
! exchange data for a field.
type (t_submessage), pointer :: message_in(:) ! Messages to recieve from remote ranks and to copy back to the field
type (openmpi_buffer_t), pointer :: mpi_buffer_in(:)
type(t_message), intent(inout) :: message
integer :: i, ierr
! !$acc enter data copyin(message) async ==> openACC to Cuda call c++?
! !$acc enter data copyin(message%mpi_buffer_in(:)) ==>openACC to cuda call c++ /fortran?
end module test
In general, a “acc enter data” region would correspond in CUDA C to the allocation of the device variable (via cudaMalloc, cudaMallocHost, or cudaMallocManaged depending on the compiler flag, i.e. default, pinned, managed) with the “copyin” clause also copying the data to the device via cudaMemcpy (or cudaMemcpyAsync).
Since there’s a parent child relationship, the OpenACC runtime will also “attach”, i.e. set the device pointer in the parent so it points to the child, “mpi_buffer_in” device pointer is set in the device copy of “message”. You are missing the allocation of each of the mpi_buffer_in’s “buff” array on the device so should be adding a loop to allocate these on the device.
The OpenACC runtime also keeps tracks of the association between the host and device addresses in a “present” table.
Granted I don’t have context on how you’re using “message” but is seems like you’d only want the “buff” array to be a device pointer, assuming you’ll be using CUDA Aware MPI. This would simplify things quite a bit.