Hello .
I’m using pgi/16/9 ( cuda8.0 ) and testing managed memory in fortran .
When I tried to add the '-ta:…managed’ option , the big code I’m porting with openacc compute wrong .
I found the problem using PGI_ACC_DEBUG=1 .
It’s a side effect of using local variable in subroutine in fortran
if declared with
!$acc declare create(Y)
versus
!$acc data create(Y)
...
!$acc end data
With the “declare” statement the local variable is not managed correctly !
Here is an example , were a local automatic array Y is computed on the GPU but it’s value is not update automatically as expected with managed memory to the CPU .
module mode_sub
implicit none
contains
subroutine sub_declare(X,N)
implicit none
INTEGER :: N
REAL ,DIMENSION (:,:,:) :: X
! local
REAL , DIMENSION(N,N,N) :: Y_DECLARE
!$acc declare create(Y_DECLARE)
! Work on GPU
!$acc kernels
Y_DECLARE = N**2
!$acc end kernels
!$acc update host(Y_DECLARE)
! Work on host
X = X + Y_DECLARE
end subroutine sub_declare
subroutine sub_data(X,N)
implicit none
INTEGER :: N
REAL ,DIMENSION (:,:,:) :: X
! local
REAL , DIMENSION(N,N,N) :: Y_DATA
!$acc data create(Y_DATA)
! Work on GPU
!$acc kernels
Y_DATA = N**2
!$acc end kernels
!$acc update host(Y_DATA)
! Work on host
X = X + Y_DATA
!$acc end data
end subroutine sub_data
end module mode_sub
program hello_declare_data_create
USE MODE_SUB
IMPLICIT NONE
INTEGER , PARAMETER :: N = 16
REAL , ALLOCATABLE ,DIMENSION (:,:,:) :: X
ALLOCATE (X(N,N,N))
X= 1.0
CALL SUB_DECLARE(X,N)
Print*,"SUB_DECLARE(X,N)=",X(N,N,N)
X= 1.0
CALL SUB_DATA(X,N)
Print*,"SUB_DATA(X,N) =", X(N,N,N)
end program hello_declare_data_create
Without the managed option all is OK
pgf90 -g -ta=host,tesla hello_managed_declare_data_create.f90 -o hello_managed_declare_data_create
hello_managed_declare_data_create
SUB_DECLARE(X,N)= 257.0000
SUB_DATA(X,N) = 257.0000
But compiled with managed memory :
pgf90 -g -ta=host,tesla,> managed > hello_managed_declare_data_create.f90 -o hello_managed_declare_data_create
hello_managed_declare_data_create
SUB_DECLARE(X,N)= > 1.000000 >
SUB_DATA(X,N) = 257.0000
I found the trick after digging a lot ( the code is very big ) .
When activating PGI_ACC_DEBUG=1 , I found that the problem is due to the fact that with the declare statement the array Y_DECLARE is allocated with the mirror function pgi_uacc_mirror_alloc , and the Y_DATA one is not !
PGI_ACC_DEBUG=1./hello_managed_declare_data_create |& egrep ‘alloc.*y_declare’
pgi_uacc_mirror_alloc> (size=4096,elemsize=4,hosthandle=0x701fc5010,lineno=0,name=y_declare)
pgi_uacc_mirror_alloc(size=4096,elemsize=4,lineno=0,name=y_declare) returns 0x7020c0000
pgi_uacc_mirror_dealloc(ptr=0x7020c0000,lineno=22,name=y_declare)
REM : The problem is identical with local allocatable array
Bye
Juan