Hi Mat,
This topic is continuation of this one. The problem was how PGI works with private arrays. So I had to convert private arrays to global arrays with an appropriate size. It dramatically increased memory requirements on CPU cite. So I’d like to extend those topic and ask:
What is the correct way to work with “private” arrays?
In my kernel I have hundred of private arrays. RHO, CPM, XXLV, XXLS is not full list. Those arrays are allocated on GPU with “acc data create” directive. Thus it takes much time for memory allocation and freeing. So I’d like to allocate big chunk of memory and split it among “private” arrays.
code example:
REAL, DIMENSION(ITS+ITE,JTS+JTE,KTS+KTE,4), TARGET :: TMP_BUF
REAL, DIMENSION(:,:,:), POINTER :: XXLS
REAL, DIMENSION(:,:,:), POINTER :: XXLV
REAL, DIMENSION(:,:,:), POINTER :: CPM
REAL, DIMENSION(:,:,:), POINTER :: RHO
...
!$acc kernels
!$acc loop independent collapse(2) gang vector(16)
do i=its,ite ! i loop (east-west)
do j=jts,jte ! j loop (north-south)
RHO => TMP_BUF(:,:,:,1)
CPM => TMP_BUF(:,:,:,2)
XXLV => TMP_BUF(:,:,:,3)
XXLS => TMP_BUF(:,:,:,4)
result in some errors. see /track/?id=263
RHO => TMP_BUF(:,:,:,1)
CPM => TMP_BUF(:,:,:,2)
XXLV => TMP_BUF(:,:,:,3)
XXLS => TMP_BUF(:,:,:,4)
!!$acc data create(LTRUE,LAMI,CPM,RHO,XXLV,XXLS, ACN) &
!$acc data create(LTRUE,LAMI,TMP_BUF, ACN) &
!$acc present(CPM,RHO,XXLV,XXLS, &
....
!$acc kernels
!$acc loop independent collapse(2) gang vector(16)
do i=its,ite ! i loop (east-west)
do j=jts,jte ! j loop (north-south)
result in
FATAL ERROR: data in PRESENT clause was not found on device 1: name=xxls
In the last case if I leave RHO, CPM, etc. in data copy section, PGI generate COPY statement for each var.
P.S. I got no answer for /track/?id=261