# Complex loop that worked in 18.4 not accel in 18.7

Hi,

Here is another example of code not being parallelized in 18.7 when it was in 18.4 and before.

The loop is:

``````!\$acc parallel default(present) present(fj,b)
!\$acc loop
do k=2,npm1
!\$acc loop
do j=2,ntm1
!\$acc loop
do i=1,nrm1
fj%r(i,j,k)=( ( st(j  )*b%p(i,j  ,k)                     -st
&(j-1)*b%p(i,j-1,k))*dth_i(j)                   -(b%t(i,j,k)-b%t(i,
&j,k-1))*dp_mult*dph_i(k)                  )*r_i(i)*sth_i(j)
enddo
enddo
enddo
c
!\$acc loop
do k=2,npm1
!\$acc loop
do j=jm0,jm1
do i=2,nrm1
fj%t(i,j,k)=( (b%r(i,j,k)-b%r(i,j,k-1))*dp_mult*dph_i(k)*st_i(j)
&                   -( r(i  )*b%p(i  ,j,k)                     -r(i
&-1)*b%p(i-1,j,k))*drh_i(i)                  )*rh_i(i)
enddo
fj%t( 1,j,k)=(fj%t(2,j,k)+(fj%t(2,j,k)-fj%t(2+1,j,k))*dr(2-1)*
&dr_i(2))
fj%t(nr,j,k)=(fj%t(nrm1,j,k)+(fj%t(nrm1,j,k)-fj%t(nrm1-1,j,k))
&*dr(nrm1)*dr_i(nrm1-1))
enddo
enddo
...
``````

and the compiler says:

``````  18274, Generating present(fj)
Generating implicit present(dr_i(:),st(1:ntm1),r_i(1:nrm1),dr(:),dph_i(2:npm1),sth_i(2:ntm1),st_i(jm0:jm1),r(1:nrm1),drh_i(2:nrm1),rh_i(2:nrm1))
Generating present(b)
18278, Loop is parallelizable
18280, Loop is parallelizable
18291, Loop is parallelizable
18292, Complex loop carried dependence of b%r\$p,b%p\$p,r,fj%t\$p prevents parallelization
18307, Loop is parallelizable
18308, Complex loop carried dependence of b%t\$p,r,b%r\$p,fj%p\$p prevents parallelization
set_pole_bc_avec_acc:
``````

I know having those two boundary lines in the second loop is strange but it seemed to work before. Is there a better way to do this?

• Ron

Hi Ron,

Can you post the complete compiler feedback messages for this loop? Also, what are the line numbers for this code? (So I can correlate them to the feedback).

My assumption is that the “Complex loop carried dependence” messages are for the loop which don’t have a “acc loop” directive on them. Hence, the compiler is applying loop dependency analysis but since the variables are pointers, it’s can’t auto parallelize them.

What’s missing from the information you posted is if the compiler still successfully offloaded and parallelized the loops decorated with “acc loop”.

-Mat

Hi,
There is no additional compiler feedback for the loops in the code. The code is:

`````` 18274	!\$acc parallel default(present) present(fj,b)
18275	!\$acc loop
18276	      do k=2,npm1
18277	!\$acc loop
18278	        do j=2,ntm1
18279	!\$acc loop
18280	          do i=1,nrm1
18281	            fj%r(i,j,k)=( ( st(j  )*b%p(i,j  ,k)                     -st
18282	     &(j-1)*b%p(i,j-1,k))*dth_i(j)                   -(b%t(i,j,k)-b%t(i,
18283	     &j,k-1))*dp_mult*dph_i(k)                  )*r_i(i)*sth_i(j)
18284	          enddo
18285	        enddo
18286	      enddo
18287	c
18288	!\$acc loop
18289	      do k=2,npm1
18290	!\$acc loop
18291	        do j=jm0,jm1
18292	          do i=2,nrm1
18293	        fj%t(i,j,k)=( (b%r(i,j,k)-b%r(i,j,k-1))*dp_mult*dph_i(k)*st_i(j)
18294	     &                   -( r(i  )*b%p(i  ,j,k)                     -r(i
18295	     &-1)*b%p(i-1,j,k))*drh_i(i)                  )*rh_i(i)
18296	          enddo
18297	          fj%t( 1,j,k)=(fj%t(2,j,k)+(fj%t(2,j,k)-fj%t(2+1,j,k))*dr(2-1)*
18298	     &dr_i(2))
18299	          fj%t(nr,j,k)=(fj%t(nrm1,j,k)+(fj%t(nrm1,j,k)-fj%t(nrm1-1,j,k))
18300	     &*dr(nrm1)*dr_i(nrm1-1))
18301	        enddo
18302	      enddo
18303	c
18304	!\$acc loop
18305	      do k=1,npm1
18306	!\$acc loop
18307	        do j=2,ntm1
18308	          do i=2,nrm1
18309	            fj%p(i,j,k)=( ( r(i  )*b%t(i  ,j,k)                     -r(i
18310	     &-1)*b%t(i-1,j,k))*drh_i(i)                   -(b%r(i,j,k)-b%r(i,j-
18311	     &1,k))*dth_i(j)                  )*rh_i(i)
18312	          enddo
18313	          fj%p( 1,j,k)=(fj%p(2,j,k)+(fj%p(2,j,k)-fj%p(2+1,j,k))*dr(2-1)*
18314	     &dr_i(2))
18315	          fj%p(nr,j,k)=(fj%p(nrm1,j,k)+(fj%p(nrm1,j,k)-fj%p(nrm1-1,j,k))
18316	     &*dr(nrm1)*dr_i(nrm1-1))
18317	        enddo
18318	      enddo
18319	!\$acc end parallel
``````

You are correct that I believe this is still running on the GPU, and it is the non “acc” loops that it is complaining about ( I just checked my 18.4 output and it is the same).
This is an issue I remember wanting to mention which is that if I do not put “acc loop” on a loop in a parallel region, shouldn’y the compile NOT try to parallelize it? I understand that in “kernels” it is expected for the compiler to do what it can automatically, but since parallel is more descriptive of what I really want, I do not think it should be trying to parallelize loops that I have not specified with “acc loop”.

• Ron