# Relation between loop independent and routine seq

In the example shown below, please see this compute region:

``````!\$acc kernels copy(arr)
!\$acc loop independent
do i=1,10
!arr(i) = i
end do
!\$acc end kernels
``````

a) if I have `loop independent` with the direct assignment statement `arr(i)=i` and no subroutine call to add(), the output is correct.
b) if I have `loop independent` with the call to add(), the results are incorrect
c) if I call add() without `loop independent`, the results are correct
d) if I have `loop independent` with the subroutine call with the seq clause added to the routine directive, the results are correct

Can someone explain the relation between loop independent, calling an openacc subroutine, and the seq clause?

I am using pgi v14.6.

Thank you,
K

``````module m_test
implicit none

contains
implicit none
!\$acc routine

integer, dimension(10), intent(inout)  :: arr
integer, intent(in)                    :: i

arr(i) = i
end subroutine
end module

program test
use m_test
implicit none

integer, dimension(10)  :: arr
integer                 :: i

!\$acc kernels copy(arr)
!\$acc loop independent
do i=1,10
!arr(i) = i
end do
!\$acc end kernels

do i=1,10
print *, arr(i)
end do
end program test
``````

[/code]

Hi K,

The default schedule for “routine” is “gang”.

For B, you have a gang scheduled loop calling a gang schedule routine, which isn’t allowed. I’ve added an RFE (TPR#20634) to see if we can flag this error.

For C, without “independent” the outer loop isn’t accelerated and a scalar kernel is generated, which doesn’t conflict with the “gang” schedule of the “routine”.

For D, “seq” is the correct way to schedule this “routine”.

Hope this helps,
Mat

When you say the default schedule for routine is gang, what does it mean? That is, what happens when you parallelize a loop using vector parallelism only (1 block, multiple threads), and a call is made to acc routine from inside the loop? Shouldn’t every thread call the acc routine independently?

Also, what does seq do exactly do? The specification is not clear to me. When an acc routine is called from multiple threads, does seq run the routine in a sequential manner? In short, how does case D work?

– K

Hi K,

“seq” is short for “sequential” and simply creates device code and all threads will execute all code in the routine.

However, other schedules can be applied to a “routine”. For example, you could use “vector” and the compiler is going to parallelize the loops in the routine across multiple threads. Hence instead of executing all loop iterations in the routine, each thread only execute a portion.

“gang” is the top most schedule and essentially means that all parallelization is contained in the routine. The caller must be within a parallel region but not within a parallel loop.

Think of it in terms of loop levels:

“seq” is the inner most body of code

`````` subroutine foo()
!\$acc routine seq
end subroutine
...
!\$ acc parallel loop gang
do I=1,N
!\$acc loop worker
do j=1,M
!\$acc  loop vector
do k=1,P
call foo()
``````

“vector” is the inner parallel loop

`````` subroutine foo()
!\$acc routine vector
!\$acc loop vector
do K=1,P
....
end subroutine
...
!\$ acc parallel loop gang
do I=1,N
!\$acc loop worker
do j=1,M
call foo()
``````

“worker” is the middle parallel loop

`````` subroutine foo()
!\$acc routine worker
!\$acc loop worker
do j=1,M
!\$acc loop vector
do K=1,P
....
end subroutine
...
!\$ acc parallel loop gang
do I=1,N
call foo()
``````

“gang” is the outer most parallel loop

`````` subroutine foo()
!\$acc routine gang
!\$acc loop gang
do I=1,N
!\$acc loop worker
do j=1,M
!\$acc loop vector
do K=1,P
....
end subroutine
...
!\$acc parallel
call foo()
!\$acc end parallel
``````

Hope this helps,
Mat

TPR 20634 - OpenACC: Give error when calling a “routine” with wrong schedule

is fixed in the current 14.9 release.

Thanks,
dave