managed memory : side effect of declare versus data create

Hello .

I’m using pgi/16/9 ( cuda8.0 ) and testing managed memory in fortran .

When I tried to add the '-ta:…managed’ option , the big code I’m porting with openacc compute wrong .

I found the problem using PGI_ACC_DEBUG=1 .

It’s a side effect of using local variable in subroutine in fortran

if declared with

 !$acc declare create(Y)

versus

!$acc data create(Y)
 ...
 !$acc end data

With the “declare” statement the local variable is not managed correctly !

Here is an example , were a local automatic array Y is computed on the GPU but it’s value is not update automatically as expected with managed memory to the CPU .

module mode_sub
  implicit none 
contains
  subroutine sub_declare(X,N)
    implicit none 
    INTEGER                 :: N
    REAL ,DIMENSION (:,:,:) :: X
    ! local
    REAL , DIMENSION(N,N,N) :: Y_DECLARE
    !$acc declare create(Y_DECLARE)
    
    ! Work on GPU    
    !$acc kernels 
    Y_DECLARE = N**2
    !$acc end kernels
    !$acc update host(Y_DECLARE)
    
    ! Work on host
    
    X = X + Y_DECLARE    
  end subroutine sub_declare
  
  subroutine sub_data(X,N)
    implicit none 
    INTEGER                 :: N
    REAL ,DIMENSION (:,:,:) :: X
    ! local
    REAL , DIMENSION(N,N,N) :: Y_DATA
    
    !$acc data create(Y_DATA)
    
    ! Work on GPU    
    !$acc kernels 
    Y_DATA = N**2
    !$acc end kernels
    !$acc update host(Y_DATA)
    
    ! Work on host
    
    X = X + Y_DATA

    !$acc end data
    
  end subroutine sub_data
  
end module mode_sub

program hello_declare_data_create
  
  USE MODE_SUB
  
  IMPLICIT NONE
  
  INTEGER , PARAMETER                   :: N = 16
  REAL , ALLOCATABLE ,DIMENSION (:,:,:) :: X
  
  ALLOCATE (X(N,N,N))
  
  X= 1.0
  CALL SUB_DECLARE(X,N)
  Print*,"SUB_DECLARE(X,N)=",X(N,N,N)
  
  X= 1.0
  CALL SUB_DATA(X,N)
  Print*,"SUB_DATA(X,N)   =", X(N,N,N)
  
end program hello_declare_data_create

Without the managed option all is OK

pgf90 -g -ta=host,tesla hello_managed_declare_data_create.f90 -o hello_managed_declare_data_create



hello_managed_declare_data_create
SUB_DECLARE(X,N)= 257.0000
SUB_DATA(X,N) = 257.0000

But compiled with managed memory :

pgf90 -g -ta=host,tesla,> managed > hello_managed_declare_data_create.f90 -o hello_managed_declare_data_create



hello_managed_declare_data_create
SUB_DECLARE(X,N)= > 1.000000 >
SUB_DATA(X,N) = 257.0000

I found the trick after digging a lot ( the code is very big ) .

When activating PGI_ACC_DEBUG=1 , I found that the problem is due to the fact that with the declare statement the array Y_DECLARE is allocated with the mirror function pgi_uacc_mirror_alloc , and the Y_DATA one is not !

PGI_ACC_DEBUG=1./hello_managed_declare_data_create |& egrep ‘alloc.*y_declare’
pgi_uacc_mirror_alloc> (size=4096,elemsize=4,hosthandle=0x701fc5010,lineno=0,name=y_declare)
pgi_uacc_mirror_alloc(size=4096,elemsize=4,lineno=0,name=y_declare) returns 0x7020c0000
pgi_uacc_mirror_dealloc(ptr=0x7020c0000,lineno=22,name=y_declare)

REM : The problem is identical with local allocatable array


Bye

Juan

Hi Juan,

Thanks for the report. This is a known problem (TPR#23052) where the “declare” directive does not work properly when using CUDA Unified Memory (UVM). I’ve added your details to the problem report.

  • Mat