OpenACC Reduction implies copy ?

Hi,

I have a standard openacc reduction function likes

      s = 0.
!$ACC PARALLEL LOOP REDUCTION(+:s) PRESENT(x,y)                                         
      do i=1,n
         s = s + x(i)*y(i)
      enddo

and the output of PGI_ACC_TIME for the function is

  vlxy_acc  NVIDIA  devicenum=0
    time(us): 111,062
    2209: compute region reached 13594 times
        2209: kernel launched 13594 times
            grid: [1029]  block: [128]
            elapsed time(us): total=223,901 max=50 min=15 avg=16
        2209: reduction kernel launched 13594 times
            grid: [2]  block: [256]
            elapsed time(us): total=165,437 max=44 min=11 avg=12
    2209: data region reached 54376 times
        2209: data copyin transfers: 13594
             device time(us): total=41,611 max=21 min=3 avg=3
        2216: data copyout transfers: 13594
             device time(us): total=69,451 max=36 min=4 avg=5

So there are some data copyin and copyout for the scale “s”, which is not intended since “s” is only used on devices. I just read the OpenACC docs at

https://www.openacc.org/blog/whats-new-openacc-27

"… the reduction clause implies copy(s) on the compute construct, "

Is there any method to eliminate the implicit “copy” ?

Thanks. /JG

Hi JG,

Is there any method to eliminate the implicit “copy” ?

Yes. Put “s” in a data region and it wont be implicitly copied back to the host after the reduction.

Something like:

      s = 0.
!$ACC DATA COPYIN(s)

!$ACC PARALLEL LOOP REDUCTION(+:s) PRESENT(x,y)                                         
      do i=1,n
         s = s + x(i)*y(i)
      enddo
! s wont be copied back since it's in a data region
...
! Another compute region that use "s"
!$ACC PARALLEL LOOP REDUCTION(+:s) PRESENT(x,y)                                         
      do i=1,n
          x(i) = x(i)/s
      enddo

!$ACC END DATA

Hope this helps,
Mat