Dear developers,
I am encountering a very weird situation. I have a massive scientific code that I want to parallelise using Openacc. My strategy is straightforward. I would love to put all arrays to the GPU by using !$ACC DATA COPYIN at the very beginning and do every computation on the GPU. Here is the lightweight version of the massive code
PROGRAM OFFLOAD
USE OMP_LIB
USE DEFINITION
IMPLICIT NONE
! integer !
integer :: j, k, l, i, n
! Check timing with or without openmp
INTEGER :: time_start, time_end
INTEGER :: cr, cm
REAL*8 :: rate
!!!
CALL OMP_SET_NUM_THREADS(64)
!!!
CALL system_clock(count_rate=cr)
CALL system_clock(count_max=cm)
rate = REAL(cr)
!!!
CALL INITIAL
!!!
!$acc data copyin(cons, prim, flux)
!!!
CALL system_clock(time_start)
DO n = 1, 100
WRITE (,) n
CALL UtoF
END DO
CALL system_clock(time_end)
WRITE(,) 'Preparation = ', REAL(time_end - time_start) / rate
!!!
!$acc end data
!!!
! check answer !
WRITE (,) prim(1,2,3,4)
!!!
END PROGRAM
And
SUBROUTINE INITIAL
USE OMP_LIB
USE DEFINITION
IMPLICIT NONE
! integer !
integer :: j, k, l
DO j = -2, nx_2 + 3
DO k = -2, ny_2 + 3
DO l = -2, nz_2 + 3
cons(imin2:imax2,j,k,l) = 3.0d0
prim(imin2:imax2,j,k,l) = 4.0d0
END DO
END DO
END DO
END SUBROUTINE
And
SUBROUTINE UtoF
USE OMP_LIB
USE DEFINITION
IMPLICIT NONE
! integer !
integer :: j, k, l
!$acc data present(cons, flux, prim)
!$OMP PARALLEL DO COLLAPSE(3) SCHEDULE(STATIC)
!$acc parallel loop gang
DO j = -2, nx_2 + 3
!acc loop worker
DO k = -2, ny_2 + 3
!$acc loop vector
DO l = -2, nz_2 + 3
flux(imin2:imax2,j,k,l) = cons(imin2:imax2,j,k,l)**2 + prim(imin2:imax2,j,k,l)
END DO
END DO
END DO
!$acc end parallel
!$OMP END PARALLEL DO
!$acc end data
END SUBROUTINE
Here is the thing. I expect no data transfer between the host and device once I declare !$ACC DATA COPYIN in the main function. But then when I profile my program using nvprof, I saw data transfer between host and device, exactly at the beginning and the end of subroutine UtoF, where I explicitly declared default(present).
Is there anyway that I can bypass this unwanted data transfer?
Thanks!