Array assignment in OpenACC using dummy indices

Hi All,

I am trying to implement OpenACC in Fortran loops where arrays are assigned using dummy variables that are not the loop indices. For example:

subroutine test_sub
implicit none
integer :: ix,iy,jx,jy,n
real, allocatable :: X(:,:)

n = 1000
allocate(X(n,n))

!$acc kernels loop copy(X)
do ix = -n/2,n/2
    !$acc loop private(jx,jy)
    do iy = -n/2,n/2
        jx = modulo(ix,n)+1
        jy = modulo(iy,n)+1

        X(jy,jx) = jx**2 + jy**2
    end do
end do

end subroutine

However, the -Minfo=accel diagnostic output tells me that this cannot be parallelized (without privatization of X, which is impossible because it is too large):

Generating copy(x(:,:))
Parallelization would require privatization of array x(:,:)
Accelerator scalar kernel generated
Parallelization would require privatization of array x(:,:)

Is there any way to do this?

Hi nrlugg,

Is there any way to do this?

Add the “independent” clause or use the “parallel” compute construct instead of “kernels”.

With “kernels”, the compiler performs analysis to determine if the loop is parallelizable. Here since your indices are computed, the compiler can’t tell if all values of “jx” and “jy” are discrete. Hence it must assume the worst case where all values of “jx” and “jy” are the same and it can’t parallelize the loop. Adding “independent” asserts to the compiler that the loop is parallelizable.

With “parallel”, you’re telling the compiler which loops to parallelize so no dependency analysis is performed.

% cat test.f90
subroutine test_sub
implicit none
integer :: ix,iy,jx,jy,n
real, allocatable :: X(:,:)

n = 1000
allocate(X(n,n))

!$acc kernels loop independent copy(X)
do ix = -n/2,n/2
    !$acc loop independent private(jx,jy)
    do iy = -n/2,n/2
        jx = modulo(ix,n)+1
        jy = modulo(iy,n)+1

        X(jy,jx) = jx**2 + jy**2
    end do
end do

end subroutine
% pgfortran -acc -Minfo=accel -c test.f90
test_sub:
      9, Generating copy(x(:,:))
     10, Loop is parallelizable
     12, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
         10, !$acc loop gang ! blockidx%y
         12, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             Interchanging generated vector loop outwards

Hope this helps,
Mat

Hi Mat,

Both the “independent” clause and “parallel” compute construct work perfectly! Thanks for your answer and explanation :)

As a side, I initially tried the parallel construct but it didn’t seem to parallalize. At that time I just had a single “parallel loop” directive around the outer loop. But trying it again with the “loop” directive on the inner loop (i.e., simply replacing “kernels” with “parallel” in my test_sub) works, as you suggested.

Thanks again,
Nath