I’ve been trying to use OpenACC with a simple code, which you can find (tar’ed) at https://goo.gl/XRrXR6.
As it is, the parallelization is done with !$acc kernels, and the results of both non-openACC and OpenACC versions are identical:
[angelv@deimos]$ source comp.sh [angelv@deimos]$ diff zcs.cpu zcs.acc [angelv@deimos]$ [...]
Because I want to have more control on how the loops in the subroutine zcs are parallelized I try to remove the !$acc kernels directive and change it with !$acc parallel and !$acc loops. The changes are minimal:
[angelv@deimos]$ diff rii.f90 rii_parallel.f90 36c36,37 < !$acc kernels --- > !$acc parallel > !$acc loop private(k,kp,k2,km,kp2,z0,q,q2) 47c48 < --- > !$acc loop private(mu2,ml2,p2,z1,mup2,pp2,z2,mlp2,ps2,pt2,z3,z4,z5,z6,z7) 79c80 < !$acc end kernels --- > !$acc end parallel [angelv@deimos]$
But if I use rii_parallel.f90 instead of rii.f90, then the results of the zc matrix are very different from both versions. I guess I don’t fully understand how to nest several loops, and/or what private really does. Any suggestions to help me properly understand what is going on?