PGI attempts to parallelize sequential loop

ARom_nsk · August 27, 2012, 12:35pm

Hi all!

In my code I have four nested loops. To avoid reduction I mark two most inner loops as sequential. According to compiler output PGI try to parallelize inner loops.

code:

!$acc data copyout(scf) copyin(Dlocal,Clocal,endpht,lstpht,ilocal)
!$acc kernels
!  Loop over sub-points
!$acc loop independent
        do ispin = 1,nspin  ! <-- 321
!$acc loop independent private(ijl)
           do isp = 1,nsp    ! <-- 325
!$acc loop seq
              do ic = 1,nc     ! <-- 328
                 imp = endpht(ip-1) + ic
                 i = lstpht(imp)
                 il = ilocal(i)
!$acc loop seq
                 do jc = 1,ic   ! <-- 335
                    jl =ilocal(lstpht(endpht(ip-1) + jc)) !ilc(jc)

                    if (il.gt.jl) then
                       ijl = il*(il+1)/2 + jl + 1
                    else
                       ijl = jl*(jl+1)/2 + il + 1
                    endif
                    if (ic .eq. jc) then
                       Dij = Dlocal(ijl,ispin)
                    else
                       Dij = 2*Dlocal(ijl,ispin)
                    endif

                    scf(isp,ip,ispin) = scf(isp,ip,ispin) + &
                        Dij*Clocal(isp,ic) * Clocal(isp,jc)    !Cij(isp)
              enddo
            enddo
          enddo
        enddo
!$acc end kernels
!$acc end data

output:

pgfortran -c -acc -ta=nvidia:4.0 -g -Minfo   `FoX/FoX-config --fcflags`   scf.f90
rhoofd:
     94, maxval reduction inlined
    134, Possible copy in and copy out of dscfl in call to matdot
    202, Invariant if transformation
    304, sum reduction inlined
    319, Generating copyout(scf(:,:,:))
         Generating copyin(ilocal(:))
         Generating copyin(lstpht(:))
         Generating copyin(endpht(:))
         Generating copyin(clocal(:,:))
         Generating copyin(dlocal(:,:))
    320, Generating copyin(endpht(:))
         Generating copyin(dlocal(:,:))
         Generating copyin(lstpht(:))
         Generating copyin(ilocal(:))
         Generating copyout(scf(:,:,:))
         Generating copyin(clocal(:,:))
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
    323, Loop is parallelizable
    325, Loop is parallelizable
    328, Loop carried dependence of 'scf' prevents parallelization
         Loop carried backward dependence of 'scf' prevents vectorization
         Accelerator kernel generated
        323, !$acc loop gang ! blockidx%y
        325, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
        328, CC 1.3 : 27 registers; 224 shared, 4 constant, 0 local memory bytes
             CC 2.0 : 27 registers; 0 shared, 240 constant, 0 local memory bytes
    335, Complex loop carried dependence of 'scf' prevents parallelization
         Loop carried dependence of 'scf' prevents parallelization
         Loop carried backward dependence of 'scf' prevents vectorization

Once again about confusing messages on line 320.
BTW, this piece of code produce different result being compiled with and without ‘-acc’. Any idea?

MatColgrove · August 27, 2012, 4:06pm

Hi Alexey,

In my code I have four nested loops. To avoid reduction I mark two most inner loops as sequential. According to compiler output PGI try to parallelize inner loops.

The compiler is just printing out the analysis information, but isn’t actually parallelizing the inner two loops.

Once again about confusing messages on line 320.

Yep. These are actually “present” checks to allow for things like pointer swapping within data regions. Issue is being tracked as TPR#18858.

BTW, this piece of code produce different result being compiled with and without ‘-acc’. Any idea?

I’d need a reproducing example to tell. Though, I’d start by simplifying things. Remove the data region and loop clauses. Next start adding them back one by one, starting with the outer loop then finally the data region.

Hope this helps,
Mat

ARom_nsk · August 28, 2012, 5:52am

In my code I have four nested loops. To avoid reduction I mark two most inner loops as sequential. According to compiler output PGI try to parallelize inner loops.

The compiler is just printing out the analysis information, but isn’t actually parallelizing the inner two loops.

In this case I’d suggest to consider this as confusing messages. I marked those loops as seq. explicitly. Therefore I don’t want to see any info about them.

MatColgrove · August 28, 2012, 2:20pm

I’ve complained about this as well. The problem is that the analysis is done before the directives are applied. Though, I’ll pass this along since customer complaints tend to get higher priority then when I complain ;).

Mat

Topic		Replies	Views
Unknown reason for sequential execution Legacy PGI Compilers (archived)	3	2030	May 1, 2018
PGI accelerator model with nested loops Legacy PGI Compilers (archived)	3	4435	September 9, 2010
should use to "acc reduction" in an inner loop Legacy PGI Compilers (archived)	4	4274	December 6, 2012
Loop is parallelizable Legacy PGI Compilers (archived)	2	1843	June 10, 2010
dependence in loop prevents parallelization Legacy PGI Compilers (archived)	3	8833	February 9, 2010
Programming Problem: force the inner loop run as sequential Legacy PGI Compilers (archived)	4	4114	September 7, 2016
OpenACC and nested loops Legacy PGI Compilers (archived)	2	4089	September 19, 2014
How to efficiently parallelize these loops, Fortran Legacy PGI Compilers (archived)	2	890	May 22, 2020
does "acc loop seq" work Legacy PGI Compilers (archived)	2	4060	October 3, 2012
loop is parallelizable Legacy PGI Compilers (archived)	3	4403	October 20, 2010

PGI attempts to parallelize sequential loop

Related topics