I have Fortran loop as below:
v(:,:) = 0.0
do iEdge = 1,nEdges
do i=1,nEdgesOnEdge(iEdge)
eoe = edgesOnEdge(i,iEdge)
do k = 1,nVertLevels
v(k,iEdge) = v(k,iEdge) + weightsOnEdge(i,iEdge) * u(k, eoe)
end do
end do
end do
Were nEdges ~ 500000, nVertLevels ~ 50,
nEdgesOnEdge is between 3 - 10, so eoe is between 1 ~ 500000.
weightOnEdge is between 0.0 - 1.0.
Can someone help me parallelize this loop with OpenACC?
I tried with something like:
!$acc kernels
v(:,:) = 0.0
!$acc end kernels
!$acc data copyin(nEdgesOnEdge, edgesOnEdge, weightsOnEdge, u), &
!$acc copy(v)
!$acc kernels
!$acc loop gang independent
do iEdge = 1,nEdges
!$acc loop worker reduction(+:v(:,iEdge)) independent
do i=1,nEdgesOnEdge(iEdge)
eoe = edgesOnEdge(i,iEdge)
!$acc loop vector independent
do k = 1,nVertLevels
v(k,iEdge) = v(k,iEdge) + weightsOnEdge(i,iEdge) * u(k, eoe)
end do
end do
end do
!$acc end kernels
!$acc end data
(it did not work.)
Thanks,
Wei