How to parallelize a nest loop with reduction

I have Fortran loop as below:

v(:,:) = 0.0

do iEdge = 1,nEdges
do i=1,nEdgesOnEdge(iEdge)
eoe = edgesOnEdge(i,iEdge)
do k = 1,nVertLevels
v(k,iEdge) = v(k,iEdge) + weightsOnEdge(i,iEdge) * u(k, eoe)
end do
end do
end do

Were nEdges ~ 500000, nVertLevels ~ 50,
nEdgesOnEdge is between 3 - 10, so eoe is between 1 ~ 500000.
weightOnEdge is between 0.0 - 1.0.

Can someone help me parallelize this loop with OpenACC?

I tried with something like:
!$acc kernels
v(:,:) = 0.0
!$acc end kernels

!$acc data copyin(nEdgesOnEdge, edgesOnEdge, weightsOnEdge, u), &
!$acc copy(v)
!$acc kernels
!$acc loop gang independent
do iEdge = 1,nEdges
!$acc loop worker reduction(+:v(:,iEdge)) independent
do i=1,nEdgesOnEdge(iEdge)
eoe = edgesOnEdge(i,iEdge)
!$acc loop vector independent
do k = 1,nVertLevels
v(k,iEdge) = v(k,iEdge) + weightsOnEdge(i,iEdge) * u(k, eoe)
end do
end do
end do
!$acc end kernels
!$acc end data

(it did not work.)

Thanks,

Wei

I think the reduction variable cannot be vector/matrix component. It should be a scalar. In your case you should use a scalar in place of v(k,iEdge) and after the reduction loop, copy the scalar to v(k,iEdge).

This is the case for OpenMP parallelisation.
I think the same is true for OpenACC.