Thanks a lot to take a look at this!
It might be simple question, but how can I get the code below to be parallelized using openacc? I think I might want to use atomic. But not so sure.
do iz = 1, zn
do it = 1, tn
do iiz = 1, zn
do iit = 1, tn
a(iiz,iit,:,:) = a(iiz,iit,:,:) + b(iz,it,:,:) * Z(iz,iiz) * T(it,iit) )
The problem here is that the “iz” and “it” aren’t parallel. Also atomic only works on a single reference, not on array syntax which expanded into implicit loops. Even if it did, atomic would severely hurt your performance since every update would need an atomic operation.
If this is the only code in this loop, I’d suggest using explicit loops instead of array syntax, then move the “iz” and “it” to be the innermost loops. You can then optionally use a reduction on the inner loops depending upon if you need more parallelism or if you need each kernel to do more work.
!$acc kernels loop ! Try adding collapse(4) if zn,tn are small
do iiz = 1,zn
do iit = 1,tn
do iiiz = 1,zn
do iit = 1,tn ! set the correct loop bounds
asum = 0.0d0
!!!! optionally try using a reduction
!!!! Also using vector here may help the data access for b, Z, and T
!!!$acc loop vector collapse(2) reduction(+:asum)
do iz =1,zn
do it = 1,tn
asum= asum+ b(iz,it,iiiz,iiit) * Z(iz,iiz) * T(it,iit) )
a(iiz,iit,iiiz,iiit) = a(iiz,iit,iiiz,iiit)+asum