It is still not clear to me the differences between the ‘kernels’ and the ‘parallel’ directives, so I’m trying both.
I had the impression that with the ‘parallel’ one I would have more control, but if I try to specify the number of gangs/workers/vectors I get the error:
PGF90-S-0533-Clause ‘Worker(value)’ not allowed in ‘Parallel Loop’ directive .
Changing it to ‘kernels’ like the following, this is no problem. Any reason why I cannot do it within the ‘parallel’ region?
36 !$acc kernels present(zc) 37 !$acc loop gang(9) collapse(2) 38 do k=kmin,kmax 39 do kp=kmin,kmax 40 k2=2*k 41 km = MIN(k,kp) 42 kp2=2*kp 43 z0=3.d0*dble(ju2+1)*dsqrt(dble(k2+1))*dsqrt(dble(kp2+1)) 44 do q=-km,km 45 q2=2*q 46 47 ! Calculate quantity C and its sum over magnetic quantum numbers 48 !$acc loop worker(16) collapse(2) 49 do mu2=-ju2,ju2,2 50 do ml2=-jl2,jl2,2 51 p2=mu2-ml2 52 if(abs(p2).gt.2) cycle 53 z1=w3js(ju2,jl2,2,mu2,-ml2,-p2) 54 !$acc loop vector(32)
Another question. With the above code, the compiler gives the following info:
zcs: 36, Generating present(zc(:,:,:,:,:,:,:)) 38, Loop is parallelizable 39, Loop is parallelizable Accelerator kernel generated Generating Tesla code 38, !$acc loop gang(9) collapse(2) ! blockidx%x 39, ! blockidx%x collapsed 44, !$acc loop seq 49, !$acc loop worker(16) collapse(2) ! threadidx%y 50, ! threadidx%y collapsed 55, !$acc loop vector(32) ! threadidx%x 60, !$acc loop seq 44, Loop carried dependence of zc prevents parallelization Loop carried backward dependence of zc prevents vectorization 49, Loop is parallelizable 50, Loop is parallelizable 55, Loop is parallelizable 60, Loop is parallelizable
And again, I’m not sure how to interpret the output. The compiler says “Loop is parallelizable” for all the DO loops. I understand that this is only the analysis stage, so it says that it could be parallelized, but it doesn’t mean that it has generated parallel code.
The loops in lines 38-39 have been parallelized correctly, as it says “Accelerator kernel generated”, and “Generating Tesla code”. Within the output for line 39, it also tells me that loop in line 44 and 60 will run sequentially, but also that the compiler was able to generate for loops 49-50 worker parallelism and for loop 55 vector parallelism.
But how do I interpret the rest of the lines?
44, Loop carried dependence of zc prevents parallelization
For some reason it thinks that there is dependence on zc and cannot parallelize this loop. That’s OK as far as the other loops where I put an !$acc loop get parallelized, since I know that there are no dependencies. Is this message warning that no parallelization will be done at this and lower levels or only for this particular loop?
49,50,55,60 Loop is parallelizable. So no parallel code was generated for these loops? Within the info for loop 39 it looks like it was generated, but these messages confuse me.
Sorry to nitpick, but I want to understand as best as possible all the information that the compiler provides, in order to aim for the best possible performance.