First, remove “pA” from the declare create, then add it to a “present” clause. Since A and pA point to the same host address, when the compiler performs the present check, it will associate pA to the A’s device copy.
% cat test.f90
module Vars
real(8), pointer :: pA(:)
real(8), allocatable, target :: A(:)
!$acc declare create(A)
end module Vars
program foo
use Vars
allocate(A(3))
pA => A
!$acc serial present(pA)
pA(3) = 3.0
pA(1) = 1.0
!$acc end serial
!$acc update host(A)
print *, A
deallocate(A)
end program foo
% nvfortran -acc test.f90 ; a.out
1.000000000000000 0.000000000000000 3.000000000000000
The problem with this solution is if you need pA in a declare create in order to support directly accessing the variable from within a device routines. In this case, keep pA in the declare create but then call acc_attach to update the device copy of pA to point to the device copy of A.
% cat test2.F90
module Vars
real(8), pointer :: pA(:)
real(8), allocatable, target :: A(:)
!$acc declare create(A,pA)
contains
subroutine setVal(idx,val)
!$acc routine seq
integer, value :: idx
real(8), value :: val
pA(idx)=val
end subroutine setVal
end module Vars
program foo
use Vars
#ifdef _OPENACC
use openacc
#endif
allocate(A(3))
pA => A
#ifdef _OPENACC
call acc_attach(pA)
#endif
!$acc serial present(pA)
call setVal(3,3.0_8)
call setVal(1,1.0_8)
!$acc end serial
!$acc update host(A)
print *, A
deallocate(A)
end program foo
% nvfortran -acc test2.F90 ; a.out
1.000000000000000 0.000000000000000 3.000000000000000
This is very helpful. Solved another issue for my OpenACC port.
I only have the OpenACC API. Can you recommend a practical reference or a book to help me get the programming nuances? I don’t think I would have figured the pointer issue without your assist.
I did write the Data Management Chapter (#5) in Parallel Programming with OpenACC , most of the examples are written in C, though it might be helpful in understanding some of the concepts. The examples from the book are available at no cost at: https://github.com/rmfarber/ParallelProgrammingWithOpenACC/tree/master/Chapter05
It looks like you can also remove both the declare line and the update line, the compiler will copy in the kernel it needs.
Here is the code:
module Vars
real(8), pointer :: pA(:)
real(8), allocatable, target :: A(:)
integer n
!!$acc declare create(A)
end module Vars
program foo
use Vars
n=3
allocate(A(n))
pA => A
!!$acc serial present(pA)
!$acc serial
A(1)=1
pA(2)=2
pA(3)=3
!$acc end serial
!!$acc update host(A)
!!$acc kernels
! do i=1, n
! pA(i) = i
! end do
!!$acc end kernels
print *, A
deallocate(A)
end program foo
Yes, the compiler will do an implicit copy of the data. Though this is bad for performance since the copy would be done every time the kernel is called. Not an issue here, but in a real code the program will end up spending most of the time copying data.
Ideally, you want to copy the data once and the beginning of the program and once at the end, and then have all computation on the data performed on the device. Data movement is one of the biggest performance bottlenecks for GPU programming and it’s best to minimize it as much as possible.