How to convert Open ACC loop code/pragma to Cuda c/c++

Example:
Module test

type t_submessages
integer :: npoints
integer, allocatable, dimension(:) :: field_ind, field_displ
end type

type openmpi_buffer_t
integer :: n
real, allocatable :: buff(:)
end type

! exchange data for a field.
type t_message
type (t_submessage), pointer :: message_in(:) ! Messages to recieve from remote ranks and to copy back to the field
type (openmpi_buffer_t), pointer :: mpi_buffer_in(:)
endtype

subroutine message_create_ondevice(message)
type(t_message), intent(inout) :: message
integer :: i, ierr

! !$acc enter data copyin(message) async ==> openACC to Cuda call c++?
! !$acc enter data copyin(message%mpi_buffer_in(:)) ==>openACC to cuda call c++ /fortran?
end subroutine
end module test

In general, a “acc enter data” region would correspond in CUDA C to the allocation of the device variable (via cudaMalloc, cudaMallocHost, or cudaMallocManaged depending on the compiler flag, i.e. default, pinned, managed) with the “copyin” clause also copying the data to the device via cudaMemcpy (or cudaMemcpyAsync).

Since there’s a parent child relationship, the OpenACC runtime will also “attach”, i.e. set the device pointer in the parent so it points to the child, “mpi_buffer_in” device pointer is set in the device copy of “message”. You are missing the allocation of each of the mpi_buffer_in’s “buff” array on the device so should be adding a loop to allocate these on the device.

The OpenACC runtime also keeps tracks of the association between the host and device addresses in a “present” table.

Granted I don’t have context on how you’re using “message” but is seems like you’d only want the “buff” array to be a device pointer, assuming you’ll be using CUDA Aware MPI. This would simplify things quite a bit.