How to parallel outer loop only

Hi,
I try to add accelerator directives into a subroutine with nested loops. The limits of inner loops are variant in the outer loops. May I parallelize the outer loop only to overcome the restriction that inner loop limits must be constant? Attached please find the code. Any suggestion is appreciated.


      SUBROUTINE GRID_BASED_NEIGHBOR_SEARCH
      
      USE param1
      USE discretelement
      USE geometry
      USE des_bc

      IMPLICIT NONE
!-----------------------------------------------
! Local variables
!-----------------------------------------------
      INTEGER I, II, IP1, IM1   ! X-coordinate loop indices
      INTEGER J, JJ, JP1, JM1   ! Y-coordinate loop indices
      INTEGER K, KK, KP1, KM1   ! Z-coordinate loop indices
      INTEGER PNO ! Temp. particle number variable
      INTEGER NPG ! Temp. cell particle count
      INTEGER LL, NP, NEIGH_L  ! Loop Counters
      INTEGER NLIM ! 
      
      DOUBLE PRECISION DISTVEC(DIMN), DIST, R_LM ! Contact variables

!$acc region do kernel copy(neighbours) copyin(pijk,des_pos_new) &
!$acc       copyin(imin1,imax1,jmin1,jmax1,dimn,kmin1,kmax1)     &  
!$acc       copyin(des_radius,factor_rlm)
      DO LL = 1, MAX_PIS

         II = PIJK(LL,1); IP1=min(II+1,imax1); IM1=max(II-1,imin1)
         JJ = PIJK(LL,2); JP1=min(JJ+1,jmax1); JM1=max(JJ-1,jmin1)
         KK = PIJK(LL,3); KP1=KK;   KM1=KK
         IF(DIMN.EQ.3)THEN 
            KP1 = min(KK+1,kmax1);   KM1 = max(KK-1,kmin1)
         ENDIF

         DO KK = KM1, KP1
            DO JJ = JM1, JP1
               DO II = IM1, IP1
! Shift loop index to new variables for manipulation
                  I = II;   J = JJ;   K = KK
! If cell IJK contains particles, store the amount in NPG
                  IF(ASSOCIATED(PIC(I,J,K)%P))THEN
                     NPG = SIZE(PIC(I,J,K)%P)
                  ELSE
                     NPG = 0
                  ENDIF

! Loop over the particles in IJK cell to determine if they are
! neighbors to particle LL
                  DO NP = 1,NPG
                     PNO = PIC(I,J,K)%P(NP)

                     IF(PNO.GT.LL)THEN 
                        R_LM = DES_RADIUS(LL) + DES_RADIUS(PNO)
                        R_LM = FACTOR_RLM*R_LM
                        DISTVEC(:) = DES_POS_NEW(PNO,:) - DES_POS_NEW(LL,:)
                        if(dimn.eq.2)then
                           dist=sqrt(distvec(1)**2+distvec(2)**2)
                        else
                           dist=sqrt(distvec(1)**2+distvec(2)**2+distvec(3)**2)
                        endif

                        IF(DIST .LE. R_LM) then
                            NEIGHBOURS(LL,1) = NEIGHBOURS(LL,1) + 1
                            NLIM  = NEIGHBOURS(LL,1) + 1
                            NEIGHBOURS(LL,NLIM) = PNO
                        ENDIF  !contact condition
                     ENDIF  !PNO.GT.LL
                  ENDDO  !NP

               ENDDO  ! II cell loop
            ENDDO  ! JJ cell loop
         ENDDO  ! KK cell loop

      ENDDO  ! Particles in system loop
!$acc end region

      END SUBROUTINE GRID_BASED_NEIGHBOR_SEARCH

[/code]

Hi Tingwen,

You should be able to work around the rectangular loop restriction using the “kernel” clause (like you have it now). However, you’ll need to remove “ASSOCIATED” is it isn’t supported on the GPU. Also, you’ll need to privatize DISTVEC (i.e add “private(DISTVEC)” to your kernel clause).

Let me know if I missed anything by posting the output from your compile with “-Minfo=accel”.

Hope this helps,
Mat

Hi Mat,
Thanks for your prompt reply. I made the changes and commented the “associated” function by setting NPG and PNO to constants. Below is the output when I compile it with PGI 10.3. Do you have any idea what is wrong? Thanks

     22, No parallel kernels found, accelerator region ignored
     25, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
     55, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
         Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
     56, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
         Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
     57, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
         Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
     70, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
         Complex loop carried dependence of 'neighbours' prevents parallelization
         Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
     76, Loop is parallelizable

Hi Tingwen,

Did you add the private clause for DISTVEC?

Try replacing your “$acc region” lines with a simpler version:

!$acc region
!$acc do kernel private(DISTVEC)

This works for me, but I did have to modify your code to work around your modules. It’s possible my changes effected the behavior. If this is the case, please send the full source to PGI Customer Support (trs@pgroup.com) and ask them to send it on to me.

  • Mat

Hi Mat,
Many thanks. I figured out a way to do it by replacing the vector with three scalars. Now it compiles successfully. Really appreciate your help.

Tingwen