Hi,
I have a code part that parallelizes just fine, where I want it to, and runs correctly with OpenMP, however not with OpenACC. Any advises ?
OpenMP :
!$omp parallel default(none) &
!$omp&shared(NColor,indexL,itemL,indexU,itemU,AL,AU,D,ALU,perm,&
!$omp& NContact,indexCL,itemCL,indexCU,itemCU,CAL,CAU,&
!$omp& ZP,icToBlockIndex,blockIndexToColorIndex) &
!$omp&private(SW1,SW2,SW3,X1,X2,X3,ic,i,iold,isL,ieL,isU,ieU,j,k,blockIndex)
!C-- FORWARD
do ic=1,NColor
!$omp do schedule (static, 1)
do blockIndex = icToBlockIndex(ic-1)+1, icToBlockIndex(ic)
do i = blockIndexToColorIndex(blockIndex-1)+1, &
blockIndexToColorIndex(blockIndex)
! do i = startPos(threadNum, ic), endPos(threadNum, ic)
iold = perm(i)
SW1= ZP(3*iold-2)
SW2= ZP(3*iold-1)
SW3= ZP(3*iold )
isL= indexL(i-1)+1
ieL= indexL(i)
do j= isL, ieL
!k= perm(itemL(j))
k= itemL(j)
X1= ZP(3*k-2)
X2= ZP(3*k-1)
X3= ZP(3*k )
SW1= SW1 - AL(9*j-8)*X1 - AL(9*j-7)*X2 - AL(9*j-6)*X3
SW2= SW2 - AL(9*j-5)*X1 - AL(9*j-4)*X2 - AL(9*j-3)*X3
SW3= SW3 - AL(9*j-2)*X1 - AL(9*j-1)*X2 - AL(9*j )*X3
enddo ! j
if (NContact.ne.0) then
isL= indexCL(i-1)+1
ieL= indexCL(i)
do j= isL, ieL
!k= perm(itemCL(j))
k= itemCL(j)
X1= ZP(3*k-2)
X2= ZP(3*k-1)
X3= ZP(3*k )
SW1= SW1 - CAL(9*j-8)*X1 - CAL(9*j-7)*X2 - CAL(9*j-6)*X3
SW2= SW2 - CAL(9*j-5)*X1 - CAL(9*j-4)*X2 - CAL(9*j-3)*X3
SW3= SW3 - CAL(9*j-2)*X1 - CAL(9*j-1)*X2 - CAL(9*j )*X3
enddo ! j
endif
X1= SW1
X2= SW2
X3= SW3
X2= X2 - ALU(9*i-5)*X1
X3= X3 - ALU(9*i-2)*X1 - ALU(9*i-1)*X2
X3= ALU(9*i )* X3
X2= ALU(9*i-4)*( X2 - ALU(9*i-3)*X3 )
X1= ALU(9*i-8)*( X1 - ALU(9*i-6)*X3 - ALU(9*i-7)*X2)
ZP(3*iold-2)= X1
ZP(3*iold-1)= X2
ZP(3*iold )= X3
enddo ! i
enddo ! blockIndex
!$omp end do
enddo ! ic
...
!$omp end parallel
OpenACC :
do ic=1,NColor
!$acc parallel loop collapse(2)
do blockIndex = icToBlockIndex(ic-1)+1, icToBlockIndex(ic)
do i = blockIndexToColorIndex(blockIndex-1)+1, &
blockIndexToColorIndex(blockIndex)
! do i = startPos(threadNum, ic), endPos(threadNum, ic)
iold = perm(i)
SW1= ZP(3*iold-2)
SW2= ZP(3*iold-1)
SW3= ZP(3*iold )
isL= indexL(i-1)+1
ieL= indexL(i)
!$acc loop vector
do j= isL, ieL
!k= perm(itemL(j))
k= itemL(j)
X1= ZP(3*k-2)
X2= ZP(3*k-1)
X3= ZP(3*k )
SW1= SW1 - AL(9*j-8)*X1 - AL(9*j-7)*X2 - AL(9*j-6)*X3
SW2= SW2 - AL(9*j-5)*X1 - AL(9*j-4)*X2 - AL(9*j-3)*X3
SW3= SW3 - AL(9*j-2)*X1 - AL(9*j-1)*X2 - AL(9*j )*X3
enddo ! j
if (NContact.ne.0) then
isL= indexCL(i-1)+1
ieL= indexCL(i)
!$acc loop vector
do j= isL, ieL
!k= perm(itemCL(j))
k= itemCL(j)
X1= ZP(3*k-2)
X2= ZP(3*k-1)
X3= ZP(3*k )
SW1= SW1 - CAL(9*j-8)*X1 - CAL(9*j-7)*X2 - CAL(9*j-6)*X3
SW2= SW2 - CAL(9*j-5)*X1 - CAL(9*j-4)*X2 - CAL(9*j-3)*X3
SW3= SW3 - CAL(9*j-2)*X1 - CAL(9*j-1)*X2 - CAL(9*j )*X3
enddo ! j
endif
X1= SW1
X2= SW2
X3= SW3
X2= X2 - ALU(9*i-5)*X1
X3= X3 - ALU(9*i-2)*X1 - ALU(9*i-1)*X2
X3= ALU(9*i )* X3
X2= ALU(9*i-4)*( X2 - ALU(9*i-3)*X3 )
X1= ALU(9*i-8)*( X1 - ALU(9*i-6)*X3 - ALU(9*i-7)*X2)
ZP(3*iold-2)= X1
ZP(3*iold-1)= X2
ZP(3*iold )= X3
enddo ! i
enddo ! blockIndex
!$acc end parallel loop
enddo ! ic
OpenACC version will fail after different amounts of iterations (5-38 typically), there seems to be a race condition. How can this race condition be avoided? This is not something that occurs in the OpenMP version.
Thanks for any ideas!
Best regards,
Olav