I use pgfortran v10.3 to compile and run my code, and find that it is much slower than the serial version.
I just had a chance to access to a computer with v10.9 on it, and found that there is a big difference regarding the data movement time.
The structure of my code is:
!$acc data region do time-loop !$acc region do i-loop !$acc end region !$acc region do j-loop !$acc end region !$acc updateout(a) do k-loop ! k-loop on CPU !$acc updatein(b) !$acc region do l-loop !$acc end region end do ! time-loop !$acc end data region
57: region entered 1 time time(us): total=799971 init=693804 region=106167 data=15816 w/o init: total=106167 max=106167 min=106167 avg=106167
57: region entered 1 time time(us): total=50128612 init=60125 region=50068487 kernels=34054 data=50034433
where line 57 is the line of “!$acc data region”
The time spent in data movement using v10.3 is much longer than that using v10.9.
I was wondering if there is any problem with v10.3 to parallelize code with the structure as I showed above.