Reduction in Nested Loops

arun.ucla · July 16, 2020, 6:08am

I am trying to understand the reduction in nested loops.
Consider a very simple program which calculates the number of iterations of inner loop.

PROGRAM main
integer N, i,j
integer*8 ans1
N = 10000
ans1 = 0

!$acc parallel copyin(N) reduction(+:ans1)
do i = 1, N
do j = 1, N
ans1 = ans1 + 1
enddo
enddo
!$acc end parallel

write(*,*) 'ans1 = ', ans1

END PROGRAM main

(Q-1) WHY is this giving CORRECT answer (100000000), when the OpenACC manual says:“If a variable is involved in a reduction that spans multiple nested loops where two or more of those loops have associated loop directives, a reduction clause containing that variable must appear on each of those loop directives.”

Now, consider a very small change to also count the number of outer-loop iterations (by introducing ans2 variable).

PROGRAM main
integer N, i,j
integer*8 ans1,ans2
N = 10000
ans1 = 0
ans2 = 0

!$acc parallel copyin(N) reduction(+:ans1,ans2)
do i = 1, N
ans2 = ans2 + 1
do j = 1, N
ans1 = ans1 + 1
enddo
enddo
!$acc end parallel

write(*,*) 'ans1 = ', ans1
write(*,*) 'ans2 = ', ans2

END PROGRAM main

(Q-2) WHY is this giving wrong answer (780000, 10000)? What is surprising is: ans1 which was earlier correct has gone wrong now!!!

If I add a loop construct before j-loop with reduction on ans1, it works. Okay fine, as the manual also says the same.
But then Q-1 remains.

Request someone to please clarify.

Thanks,
Arun

MatColgrove · July 16, 2020, 4:56pm

Correct, to be compliant with the standard you should technically be putting the reduction clause on both loops, though often the compiler is able to implicitly add the reduction for you when it’s analysis has not been overridden by the user. (i.e. “auto” is used when the user has not explicitly added a “loop” directive or when using “kernels”)

Note that you aren’t using a “loop” directive here in which case the reduction is being applied to the parallel region, not a loop so is only applied to the gang loop.

In looking at the compiler feedback messages, it’s not parallelizing the outer loop and only applying the reduction to the inner loop:

% nvfortran -fast -acc test.F90 -Minfo=accel -V20.5 ; a.out
main:
      7, Generating copyin(n) [if not already present]
         Generating implicit copy(ans1) [if not already present]
         Generating Tesla code
          8, !$acc loop seq
          9, !$acc loop vector(128) ! threadidx%x
             Generating reduction(+:ans1)
      8, Loop is parallelizable
      9, Loop is parallelizable
 ans1 =                 100000000

As of the 20.1, the second example gets the expected answer. Though without the loop clause, the code is still relying on the compiler analysis to apply the loop schedules so is only parallelizing the inner loop and applying an implicit reduction:

% pgfortran -ta=tesla -Minfo=accel test2.F90 -V20.1 ; a.out
main:
      8, Generating copyin(n) [if not already present]
         Generating implicit copy(ans2) [if not already present]
         Generating Tesla code
          9, !$acc loop seq
             Generating reduction(+:ans1,ans2)
         11, !$acc loop vector(128) ! threadidx%x
             Generating implicit reduction(+:ans1)
      8, Generating implicit copy(ans1) [if not already present]
      9, Loop is parallelizable
     11, Loop is parallelizable
 ans1 =                 100000000
 ans2 =                     10000

Topic		Replies	Views
Reduction in Nested Loops contd Legacy PGI Compilers	4	1215	July 18, 2020
reduction clause Legacy PGI Compilers	2	3085	May 26, 2014
Proper OpenACC reduction clause on many loops within "parallel" region nvc, nvc++ and nvfortran	1	460	March 6, 2021
OpenACC reductions Legacy PGI Compilers	1	2514	March 26, 2012
Reduction results in wrong results. Bug? Legacy PGI Compilers	8	7764	January 24, 2014
should use to "acc reduction" in an inner loop Legacy PGI Compilers	4	4273	December 6, 2012
Nvfortran OPENACC reduction problem/bug nvc, nvc++ and nvfortran	6	314	June 27, 2024
Question about the reduction clause in OpenACC Legacy PGI Compilers	1	2060	July 29, 2013
OpenACC 2.0 standard and nested loops Legacy PGI Compilers	6	10515	May 2, 2014
Nested loop and reductions nvc, nvc++ and nvfortran	3	1029	February 14, 2021

Reduction in Nested Loops

Related topics