Relation between loop independent and routine seq

In the example shown below, please see this compute region:

!$acc kernels copy(arr)
!$acc loop independent
  do i=1,10
    !arr(i) = i
    call add(arr,i)
  end do
!$acc end kernels

a) if I have loop independent with the direct assignment statement arr(i)=i and no subroutine call to add(), the output is correct.
b) if I have loop independent with the call to add(), the results are incorrect
c) if I call add() without loop independent, the results are correct
d) if I have loop independent with the subroutine call with the seq clause added to the routine directive, the results are correct

Can someone explain the relation between loop independent, calling an openacc subroutine, and the seq clause?

I am using pgi v14.6.

Thank you,
K

module m_test
  implicit none

  contains
  subroutine add(arr,i)
    implicit none
!$acc routine

    integer, dimension(10), intent(inout)  :: arr
    integer, intent(in)                    :: i

    arr(i) = i
  end subroutine
end module

program test
  use m_test
  implicit none

  integer, dimension(10)  :: arr
  integer                 :: i

!$acc kernels copy(arr)
!$acc loop independent
  do i=1,10
    !arr(i) = i
    call add(arr,i)
  end do
!$acc end kernels

  do i=1,10
    print *, arr(i)
  end do
end program test

[/code]

Hi K,

The default schedule for “routine” is “gang”.

For B, you have a gang scheduled loop calling a gang schedule routine, which isn’t allowed. I’ve added an RFE (TPR#20634) to see if we can flag this error.

For C, without “independent” the outer loop isn’t accelerated and a scalar kernel is generated, which doesn’t conflict with the “gang” schedule of the “routine”.

For D, “seq” is the correct way to schedule this “routine”.

Hope this helps,
Mat

Thanks for your reply.

When you say the default schedule for routine is gang, what does it mean? That is, what happens when you parallelize a loop using vector parallelism only (1 block, multiple threads), and a call is made to acc routine from inside the loop? Shouldn’t every thread call the acc routine independently?

Also, what does seq do exactly do? The specification is not clear to me. When an acc routine is called from multiple threads, does seq run the routine in a sequential manner? In short, how does case D work?

– K

Hi K,

“seq” is short for “sequential” and simply creates device code and all threads will execute all code in the routine.

However, other schedules can be applied to a “routine”. For example, you could use “vector” and the compiler is going to parallelize the loops in the routine across multiple threads. Hence instead of executing all loop iterations in the routine, each thread only execute a portion.

“gang” is the top most schedule and essentially means that all parallelization is contained in the routine. The caller must be within a parallel region but not within a parallel loop.

Think of it in terms of loop levels:

“seq” is the inner most body of code

 subroutine foo()
 !$acc routine seq
 end subroutine
...
 !$ acc parallel loop gang
 do I=1,N
   !$acc loop worker
   do j=1,M
     !$acc  loop vector
      do k=1,P
         call foo()

“vector” is the inner parallel loop

 subroutine foo()
 !$acc routine vector
 !$acc loop vector
     do K=1,P
        ....
 end subroutine
...
 !$ acc parallel loop gang
 do I=1,N
   !$acc loop worker
   do j=1,M
         call foo()

“worker” is the middle parallel loop

 subroutine foo()
 !$acc routine worker
 !$acc loop worker
 do j=1,M
    !$acc loop vector
     do K=1,P
        ....
 end subroutine
...
 !$ acc parallel loop gang
 do I=1,N
         call foo()

“gang” is the outer most parallel loop

 subroutine foo()
 !$acc routine gang
 !$acc loop gang
 do I=1,N
   !$acc loop worker
   do j=1,M
      !$acc loop vector
       do K=1,P
        ....
 end subroutine
...
 !$acc parallel 
     call foo()
 !$acc end parallel

Hope this helps,
Mat

TPR 20634 - OpenACC: Give error when calling a “routine” with wrong schedule

is fixed in the current 14.9 release.

Thanks,
dave