How to convert Open ACC loop code/pragma to Cuda c/c++

anilbommareddy09 · October 12, 2020, 1:59pm

Example:
Module test

type t_submessages
integer :: npoints
integer, allocatable, dimension(:) :: field_ind, field_displ
end type

type openmpi_buffer_t
integer :: n
real, allocatable :: buff(:)
end type

! exchange data for a field.
type t_message
type (t_submessage), pointer :: message_in(:) ! Messages to recieve from remote ranks and to copy back to the field
type (openmpi_buffer_t), pointer :: mpi_buffer_in(:)
endtype

subroutine message_create_ondevice(message)
type(t_message), intent(inout) :: message
integer :: i, ierr

! !$acc enter data copyin(message) async ==> openACC to Cuda call c++?
! !$acc enter data copyin(message%mpi_buffer_in(:)) ==>openACC to cuda call c++ /fortran?
end subroutine
end module test

MatColgrove · October 12, 2020, 3:10pm

In general, a “acc enter data” region would correspond in CUDA C to the allocation of the device variable (via cudaMalloc, cudaMallocHost, or cudaMallocManaged depending on the compiler flag, i.e. default, pinned, managed) with the “copyin” clause also copying the data to the device via cudaMemcpy (or cudaMemcpyAsync).

Since there’s a parent child relationship, the OpenACC runtime will also “attach”, i.e. set the device pointer in the parent so it points to the child, “mpi_buffer_in” device pointer is set in the device copy of “message”. You are missing the allocation of each of the mpi_buffer_in’s “buff” array on the device so should be adding a loop to allocate these on the device.

The OpenACC runtime also keeps tracks of the association between the host and device addresses in a “present” table.

Granted I don’t have context on how you’re using “message” but is seems like you’d only want the “buff” array to be a device pointer, assuming you’ll be using CUDA Aware MPI. This would simplify things quite a bit.

Topic		Replies	Views
fortran cuda interface passing pointer from fortran and allocating memory on device CUDA Programming and Performance	0	902	May 14, 2010
Calling CUDA kernel from within OpenACC clause, device pointer passing Legacy PGI Compilers	2	2917	December 20, 2019
OpenACC FORTRAN pointer how-to question nvc, nvc++ and nvfortran	5	1164	December 19, 2023
fortran cuda interface passing pointer from fortran and allocating memory on device CUDA Programming and Performance	8	9941	May 14, 2010
OpenACC / CUDA Legacy PGI Compilers	1	2508	June 24, 2013
OpenACC C/CUDA error Legacy PGI Compilers	2	4934	October 30, 2013
C++ Smart Pointers and OpenACC nvc, nvc++ and nvfortran nvcc	3	320	July 31, 2024
copy data which is allocated in device from device to host CUDA Programming and Performance	2	4645	December 2, 2011
cudaMemcpy fails copying ACC variable to CUF variable Legacy PGI Compilers	3	3335	August 8, 2013
cuBLAS in Fortran OpenMP offloading with Managed Memory nvc, nvc++ and nvfortran	6	317	April 19, 2024

How to convert Open ACC loop code/pragma to Cuda c/c++

Related topics