I wander if parallel reduction loop was generated…
here is part of my code:
655 !$acc loop independent gang vector(16)
656 do i=its,ite
657 !$acc loop independent gang vector(16)
658 do j=jts,jte
659 LT = 0
660 !$acc loop reduction(ior:LT)
661 do K=KTS,KTE
662 LT_ = 1
...
1764 200 CONTINUE
1765 LT = IOR(LT,LT_)
1766 END DO
1767 LTRUE(j,i) = LT
1768 end do
1769 end do
with line 660 being commented PGI reports:
656, Loop is parallelizable
658, Loop is parallelizable
Accelerator kernel generated
656, !$acc loop gang, vector(16) ! blockidx%x threadidx%x
658, !$acc loop gang, vector(16) ! blockidx%y threadidx%y
661, Scalar last value needed after loop for 'lt' at line 1767
Accelerator restriction: scalar variable live-out from loop: lt
Inner sequential loop scheduled on accelerator
It’s understandable.
With uncommented line 660 PGI tells:
656, Loop is parallelizable
658, Loop is parallelizable
Accelerator kernel generated
656, !$acc loop gang, vector(16) ! blockidx%x threadidx%x
658, !$acc loop gang, vector(16) ! blockidx%y threadidx%y
661, Loop is parallelizable
Other information about nested loop is not present.
Well. Result is correct, but:
execution time of the kernel is not changed
PGI_ACC_DEBUG tells no reduction is present
Function 0 = 0x2a1f040 = morr_two_moment_micro_658_gpu
658 = lineno
16x16x1 = block size
-1x-2x1 = grid size
1x1x1 = unroll
0 = shared memory
0 = reduction shared memory
0 = reduction arg
0 = reduction bytes
840 = argument bytes
0 = max argument bytes
2 = size arguments
So, why does PGI failed to generate reduction op. Are there any restrictions?
Alexey