# PGI attempts to parallelize sequential loop

Hi all!

1. In my code I have four nested loops. To avoid reduction I mark two most inner loops as sequential. According to compiler output PGI try to parallelize inner loops.

code:

``````!\$acc data copyout(scf) copyin(Dlocal,Clocal,endpht,lstpht,ilocal)
!\$acc kernels
!  Loop over sub-points
!\$acc loop independent
do ispin = 1,nspin  ! <-- 321
!\$acc loop independent private(ijl)
do isp = 1,nsp    ! <-- 325
!\$acc loop seq
do ic = 1,nc     ! <-- 328
imp = endpht(ip-1) + ic
i = lstpht(imp)
il = ilocal(i)
!\$acc loop seq
do jc = 1,ic   ! <-- 335
jl =ilocal(lstpht(endpht(ip-1) + jc)) !ilc(jc)

if (il.gt.jl) then
ijl = il*(il+1)/2 + jl + 1
else
ijl = jl*(jl+1)/2 + il + 1
endif
if (ic .eq. jc) then
Dij = Dlocal(ijl,ispin)
else
Dij = 2*Dlocal(ijl,ispin)
endif

scf(isp,ip,ispin) = scf(isp,ip,ispin) + &
Dij*Clocal(isp,ic) * Clocal(isp,jc)    !Cij(isp)
enddo
enddo
enddo
enddo
!\$acc end kernels
!\$acc end data
``````

output:

``````pgfortran -c -acc -ta=nvidia:4.0 -g -Minfo   `FoX/FoX-config --fcflags`   scf.f90
rhoofd:
94, maxval reduction inlined
134, Possible copy in and copy out of dscfl in call to matdot
202, Invariant if transformation
304, sum reduction inlined
319, Generating copyout(scf(:,:,:))
Generating copyin(ilocal(:))
Generating copyin(lstpht(:))
Generating copyin(endpht(:))
Generating copyin(clocal(:,:))
Generating copyin(dlocal(:,:))
320, Generating copyin(endpht(:))
Generating copyin(dlocal(:,:))
Generating copyin(lstpht(:))
Generating copyin(ilocal(:))
Generating copyout(scf(:,:,:))
Generating copyin(clocal(:,:))
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
323, Loop is parallelizable
325, Loop is parallelizable
328, Loop carried dependence of 'scf' prevents parallelization
Loop carried backward dependence of 'scf' prevents vectorization
Accelerator kernel generated
323, !\$acc loop gang ! blockidx%y
325, !\$acc loop gang, vector(128) ! blockidx%x threadidx%x
328, CC 1.3 : 27 registers; 224 shared, 4 constant, 0 local memory bytes
CC 2.0 : 27 registers; 0 shared, 240 constant, 0 local memory bytes
335, Complex loop carried dependence of 'scf' prevents parallelization
Loop carried dependence of 'scf' prevents parallelization
Loop carried backward dependence of 'scf' prevents vectorization
``````
1. Once again about confusing messages on line 320.

2. BTW, this piece of code produce different result being compiled with and without ‘-acc’. Any idea?

Hi Alexey,

1. In my code I have four nested loops. To avoid reduction I mark two most inner loops as sequential. According to compiler output PGI try to parallelize inner loops.

The compiler is just printing out the analysis information, but isn’t actually parallelizing the inner two loops.

1. Once again about confusing messages on line 320.

Yep. These are actually “present” checks to allow for things like pointer swapping within data regions. Issue is being tracked as TPR#18858.

1. BTW, this piece of code produce different result being compiled with and without ‘-acc’. Any idea?

I’d need a reproducing example to tell. Though, I’d start by simplifying things. Remove the data region and loop clauses. Next start adding them back one by one, starting with the outer loop then finally the data region.

Hope this helps,
Mat

1. In my code I have four nested loops. To avoid reduction I mark two most inner loops as sequential. According to compiler output PGI try to parallelize inner loops.

The compiler is just printing out the analysis information, but isn’t actually parallelizing the inner two loops.

In this case I’d suggest to consider this as confusing messages. I marked those loops as seq. explicitly. Therefore I don’t want to see any info about them.

I’ve complained about this as well. The problem is that the analysis is done before the directives are applied. Though, I’ll pass this along since customer complaints tend to get higher priority then when I complain ;).

• Mat