data movement time


I use pgfortran v10.3 to compile and run my code, and find that it is much slower than the serial version.
I just had a chance to access to a computer with v10.9 on it, and found that there is a big difference regarding the data movement time.

The structure of my code is:

!$acc data region

do time-loop        

!$acc region
   do i-loop
!$acc end region

!$acc region
   do j-loop
!$acc end region

!$acc updateout(a)

 do k-loop   ! k-loop on CPU

!$acc updatein(b)

!$acc region
   do l-loop
!$acc end region

end do ! time-loop

!$acc end data region


57: region entered 1 time
        time(us): total=799971 init=693804 region=106167
        w/o init: total=106167 max=106167 min=106167 avg=106167


57: region entered 1 time
        time(us): total=50128612 init=60125 region=50068487
                  kernels=34054 data=50034433

where line 57 is the line of “!$acc data region”
The time spent in data movement using v10.3 is much longer than that using v10.9.

I was wondering if there is any problem with v10.3 to parallelize code with the structure as I showed above.



It’s actually not a problem with the data movement. Rather in 10.3 the data movement times reported by the profiler are incorrect when using data regions. This has since been corrected.

  • Mat