does acc recognize "implicit" loops in fortran

For example,

        real, dimension(numparts), intent(inout) :: tx, ty, tz
        real, dimension(numparts), intent(inout) :: tvx, tvy, tvz

!$acc kernels
        tx = tx + (1.0-theta)*dtsc*tvx
        ty = ty + (1.0-theta)*dtsc*tvy
        tz = tz + (1.0-theta)*dtsc*tvz
!$acc end kernels

Is there any reason to write out the loop explicitly, as in

!$acc kernels do
do i=1,numparts
  tx(i)=tx(i)+(1.0-theta)*dtsc*tvx(i)
  ...
enddo

Unrelated side question:

Accelerator Kernel Timing data
/home/ben/Documents/benscode/OA_revised/particle_routines.f90
  ppush  NVIDIA  devicenum=0
    time(us): 6,466,209
    19: compute region reached 8125 times
        19: data copyin reached 73125 times
             device time(us): total=3,026,122 max=460 min=34 avg=41
        20: kernel launched 8125 times
            grid: [391]  block: [128]
             device time(us): total=297,147 max=122 min=29 avg=36
            elapsed time(us): total=349,437 max=539 min=39 avg=43
        35: kernel launched 8125 times
            grid: [391]  block: [128]
             device time(us): total=94,686 max=29 min=10 avg=11
            elapsed time(us): total=144,757 max=359 min=16 avg=17
        37: data copyout reached 73125 times
             device time(us): total=3,048,254 max=392 min=34 avg=41

From the line “time(us): 6,466,209” near the top, what is this time referring too? Elapsed time spent on the GPU? Elapsed time spent including host and GPU times?

Hi Brush,

Is there any reason to write out the loop explicitly,

Personally, I’d leave it as array syntax. The less modification needed to the original code, the better. But, the reason’s to create an explicit loop instead of the implicit array syntax loop are:

  • If you want/need to explicitly set the accelerator loop schedule.
    Group these together in a single kernel. Currently three kernels would be generated.
    You wish to use “parallel” instead of “kernels”.
    Not all OpenACC compilers may support acceleration of array syntax.


From the line “time(us): 6,466,209” near the top, what is this time referring too?

It’s the summation of the device time (compute and data transfer) spent in this routine.

  • Mat