# How to parallelize a nest loop with reduction

I have Fortran loop as below:

v(:,:) = 0.0

do iEdge = 1,nEdges
do i=1,nEdgesOnEdge(iEdge)
eoe = edgesOnEdge(i,iEdge)
do k = 1,nVertLevels
v(k,iEdge) = v(k,iEdge) + weightsOnEdge(i,iEdge) * u(k, eoe)
end do
end do
end do

Were nEdges ~ 500000, nVertLevels ~ 50,
nEdgesOnEdge is between 3 - 10, so eoe is between 1 ~ 500000.
weightOnEdge is between 0.0 - 1.0.

Can someone help me parallelize this loop with OpenACC?

I tried with something like:
!\$acc kernels
v(:,:) = 0.0
!\$acc end kernels

!\$acc data copyin(nEdgesOnEdge, edgesOnEdge, weightsOnEdge, u), &
!\$acc copy(v)
!\$acc kernels
!\$acc loop gang independent
do iEdge = 1,nEdges
!\$acc loop worker reduction(+:v(:,iEdge)) independent
do i=1,nEdgesOnEdge(iEdge)
eoe = edgesOnEdge(i,iEdge)
!\$acc loop vector independent
do k = 1,nVertLevels
v(k,iEdge) = v(k,iEdge) + weightsOnEdge(i,iEdge) * u(k, eoe)
end do
end do
end do
!\$acc end kernels
!\$acc end data

(it did not work.)

Thanks,

Wei

I think the reduction variable cannot be vector/matrix component. It should be a scalar. In your case you should use a scalar in place of v(k,iEdge) and after the reduction loop, copy the scalar to v(k,iEdge).

This is the case for OpenMP parallelisation.
I think the same is true for OpenACC.