Hi,
I’ve been trying to use OpenACC with a simple code, which you can find (tar’ed) at FileNurse - Free eBook, PDF and Document Sharing.
As it is, the parallelization is done with !$acc kernels, and the results of both non-openACC and OpenACC versions are identical:
[angelv@deimos]$ source comp.sh
[angelv@deimos]$ diff zcs.cpu zcs.acc
[angelv@deimos]$
[...]
Because I want to have more control on how the loops in the subroutine zcs are parallelized I try to remove the !$acc kernels directive and change it with !$acc parallel and !$acc loops. The changes are minimal:
[angelv@deimos]$ diff rii.f90 rii_parallel.f90
36c36,37
< !$acc kernels
---
> !$acc parallel
> !$acc loop private(k,kp,k2,km,kp2,z0,q,q2)
47c48
<
---
> !$acc loop private(mu2,ml2,p2,z1,mup2,pp2,z2,mlp2,ps2,pt2,z3,z4,z5,z6,z7)
79c80
< !$acc end kernels
---
> !$acc end parallel
[angelv@deimos]$
But if I use rii_parallel.f90 instead of rii.f90, then the results of the zc matrix are very different from both versions. I guess I don’t fully understand how to nest several loops, and/or what private really does. Any suggestions to help me properly understand what is going on?
Thanks,
AdV