I am trying to understand the reduction in nested loops.
This is a follow-up question to: Reduction in Nested Loops
The OpenACC manual says:“If a variable is involved in a reduction that spans multiple nested loops where two or more of those loops have associated loop directives, a reduction clause containing that variable must appear on each of those loop directives.”
Pl find a simple program:
- ans1 counts the total iterations of inner loop. (100000000)
- ans2 counts the total iterations of outer loop. (10000)
Now, this program is giving a different (wrong) answer every run!
After the program, below i give 3 changes which give correct answer (but i am not able to understand WHY)
PROGRAM main
integer N, i,j
integer*8 ans1,ans2
N = 10000
ans1 = 0
ans2 = 0
!$acc parallel copyin(N) copy(ans1,ans2)
!$acc loop reduction(+:ans1,ans2)
do i = 1, N
ans2 = ans2 + 1
!$acc loop reduction(+:ans1)
do j = 1, N
ans1 = ans1 + 1
enddo
!$acc end loop
enddo
!$acc end loop
!$acc end parallel
write(*,*) 'ans1 = ', ans1
write(*,*) 'ans2 = ', ans2
END PROGRAM main
If i do any of these changes, i get correct answer:
(1) Replicate the 1st reduction (+:ans1,ans2) on the parallel-construct ALSO.
(2) Merge the 1st loop+reduction with the parallel-construct.
(3) Remove the initializations: ans1=0, ans2=0, AND make them copyout (instead of copy).
My question is:
(Q-1) Why is the original program NOT working?
(Q-2) Why is any of the (1),(2),(3) giving correct output?
Pl help.
Thanks,
Arun