Hi,
From the OpenACC spec, it seems one must put the reduction clause on all loops within a parallel region. It also seems to be needed on the parallel directive itself.
In the past, I have only put the reduction on the parallel region but not the loops and it seemed to work.
Is the following code correct to compute result=SUM(P(:)*Q(:)) (P and Q are stride-1 but smaller than nr*nt*np
) ?
!$acc parallel default(present) reduction(+:result)
!$acc loop collapse(3) reduction(+:result)
do k=2,npm1
do j=2,ntm1
do i=1,nrm1
l=ntm2*nrm1*(k-2)+nrm1*(j-2)+i
if (rb0.or.i.gt.1) then
result=result+p(l)*q(l)
end if
enddo
enddo
enddo
!$acc loop collapse(3) reduction(+:result)
do k=2,npm1
do j=jm0,jm1
do i=2,nrm1
l=(npm2*ntm2*nrm1)
& +(jm1-jm0+1)*nrm2*(k-2)+nrm2*(j-jm0)+(i-1)
if (tb0.or.j.gt.1) then
result=result+p(l)*q(l)
end if
enddo
enddo
enddo
!$acc loop collapse(3) reduction(+:result)
do k=1,npm1
do j=2,ntm1
do i=2,nrm1
l=(npm2*ntm2*nrm1)
& +(npm2*(jm1-jm0+1)*nrm2)
& +ntm2*nrm2*(k-1)+nrm2*(j-2)+(i-1)
if (iproc_p.eq.0.or.k.gt.1) then
result=result+p(l)*q(l)
end if
enddo
enddo
enddo
!$acc end parallel
- Ron