The intent with this code snippet is to parallelize j and i loops but not k loop. The indicated 1-D arrays (k) and scalars should be private to each (i,j) iteration. Any idea what is wrong with my construct? Thanks.

!$acc region copyin(dzmx,ns,is,ie,js,je,ng,km,iad,ktop,gama,cp,cappa,rdgas,dm2,pm2,quick_p,c_core,pt,bdt,seq,grg,rcp,rdt) &

!$acc copy(dz2,w) copyout(p3)

!$acc do parallel independent private(c2,p2,pt2,r_p,r_n,rden,dz,dm,wm,dts,pdt,m_bot,m_top,r_bot,r_top,time_left,pe1,pbar,wbar,dt,z_frac,t_left,a1,b1,g2,k2,ke,kt,k0,k1,k3)

do j = js,je ! j_loop

!$acc do parallel independent

do 6000 i=is,ie

do 5000 n=1,ns

dt = seq(n)

do k=ktop,km

dts(k) = -dz(k) / c2(k)

pdt(k) = dts(k)*(p2(k)-pm2(i,j,k))

r_p(k) = wm(k) + pdt(k)

r_n(k) = wm(k) - pdt(k)

enddo

do k=ktop+1,km+1

k2(k) = k-1

m_top(k) = 0.

r_top(k) = 0.

time_left(k) = dt

enddo

do 444 ke=km+1,ktop+1,-1

kt=k2(ke)

do k=kt,ktop,-1

z_frac = time_left(ke)/dts(k)

if ( z_frac <1> 2 ) then

k1 = ke-1

k2(k1) = k

m_top(k1) = m_top(ke) - dm(k1)

r_top(k1) = r_top(ke) - r_n(k1)

time_left(k1) = time_left(ke) + dts(k1)

endif

m_top(ke) = m_top(ke) + z_frac*dm(k)
r_top(ke) = r_top(ke) + z_frac*r_n(k)

exit

else

time_left(ke) = time_left(ke) - dts(k)

m_top(ke) = m_top(ke) + dm(k)

r_top(ke) = r_top(ke) + r_n(k)

endif

enddo

if ( z_frac <= 1. ) cycle

if ( ke == ktop+1 ) exit

do k=ke-1,ktop+1,-1

m_top(k) = m_top(k+1) - dm(k)

r_top(k) = r_top(k+1) - r_n(k)

enddo

exit

444 continue

5000 continue

6000 continue

end do ! j_loop

!$acc end region

The compiler messages are:

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Unexpected flow graph (nh_core_cpu_mod.F90: 225)

riem_3d:

228, Loop is parallelizable

230, Loop is parallelizable

232, Complex loop carried dependence of ‘dts’ prevents parallelization

Loop carried dependence of ‘dts’ prevents parallelization

Loop carried backward dependence of ‘dts’ prevents vectorization

Loop carried reuse of ‘pdt’ prevents parallelization

Loop carried dependence of ‘r_n’ prevents parallelization

Complex loop carried dependence of ‘r_n’ prevents parallelization

Loop carried backward dependence of ‘r_n’ prevents vectorization

Complex loop carried dependence of ‘k2’ prevents parallelization

Loop carried dependence of ‘k2’ prevents parallelization

Loop carried backward dependence of ‘k2’ prevents vectorization

Loop carried dependence of ‘m_top’ prevents parallelization

Complex loop carried dependence of ‘m_top’ prevents parallelization

Loop carried backward dependence of ‘m_top’ prevents vectorization

Loop carried dependence of ‘r_top’ prevents parallelization

Complex loop carried dependence of ‘r_top’ prevents parallelization

Loop carried backward dependence of ‘r_top’ prevents vectorization

Loop carried dependence of ‘time_left’ prevents parallelization

Complex loop carried dependence of ‘time_left’ prevents parallelization

Loop carried backward dependence of ‘time_left’ prevents vectorization

Loop carried dependence of ‘k2’ prevents vectorization

Loop carried dependence of ‘m_top’ prevents vectorization

Loop carried dependence of ‘r_top’ prevents vectorization

Loop carried dependence of ‘time_left’ prevents vectorization

Loop carried scalar dependence for ‘z_frac’ at line 271

Accelerator kernel generated

228, !$acc do parallel ! blockidx%y

230, !$acc do parallel, vector(256) ! blockidx%x threadidx%x

232, !$acc do seq(256)

Cached references to size [256] block of ‘seq’

236, Loop is parallelizable

243, Loop is parallelizable

250, Complex loop carried dependence of ‘k2’ prevents parallelization

Loop carried reuse of ‘k2’ prevents parallelization

Loop carried dependence of ‘k2’ prevents parallelization

Loop carried backward dependence of ‘k2’ prevents vectorization

Complex loop carried dependence of ‘m_top’ prevents parallelization

Loop carried dependence of ‘m_top’ prevents parallelization

Loop carried backward dependence of ‘m_top’ prevents vectorization

Complex loop carried dependence of ‘r_top’ prevents parallelization

Loop carried dependence of ‘r_top’ prevents parallelization

Loop carried backward dependence of ‘r_top’ prevents vectorization

Loop carried reuse of ‘r_top’ prevents parallelization

Complex loop carried dependence of ‘time_left’ prevents parallelization

Loop carried dependence of ‘time_left’ prevents parallelization

Loop carried backward dependence of ‘time_left’ prevents vectorization

Loop carried reuse of ‘time_left’ prevents parallelization

Loop carried scalar dependence for ‘z_frac’ at line 271

252, Complex loop carried dependence of ‘time_left’ prevents parallelization

Scalar last value needed after loop for ‘z_frac’ at line 262

Scalar last value needed after loop for ‘z_frac’ at line 263

Scalar last value needed after loop for ‘z_frac’ at line 271

Loop carried reuse of ‘time_left’ prevents parallelization

Complex loop carried dependence of ‘m_top’ prevents parallelization

Loop carried dependence of ‘m_top’ prevents parallelization

Loop carried backward dependence of ‘m_top’ prevents vectorization

Complex loop carried dependence of ‘r_top’ prevents parallelization

Loop carried reuse of ‘r_top’ prevents parallelization

Inner sequential loop scheduled on accelerator

253, Accelerator restriction: induction variable live-out from loop: k

266, Accelerator restriction: induction variable live-out from loop: k

267, Accelerator restriction: induction variable live-out from loop: k

268, Accelerator restriction: induction variable live-out from loop: k

270, Accelerator restriction: induction variable live-out from loop: k

273, Loop carried dependence of ‘m_top’ prevents parallelization

Loop carried backward dependence of ‘m_top’ prevents vectorization

Loop carried dependence of ‘r_top’ prevents parallelization

Loop carried backward dependence of ‘r_top’ prevents vectorization

Inner sequential loop scheduled on accelerator