Data dependency?

Hello!
Compiling the code:

 299 !!$acc loop collapse(3) independent
 300         DO i = max(its,ibe-spec_zone+1), itf
 301           b_dist = ibe - i
 302           DO k = kts, ktf
 303             DO j = max(jts,b_dist+jbs+1), min(jtf,jbe-b_dist-1)
 304               mu_old_ = muts(i,j) - dt*mu_tend(i,j)
 305
 306               field(i,k,j) = field(i,k,j)*mu_old_/muts(i,j) + &
 307                    dt*field_tend(i,k,j)/muts(i,j) +               &
 308                    ph_save(i,k,j)*(mu_old_/muts(i,j) - 1.)
 309
 310             ENDDO
 311           ENDDO
 312         ENDDO

I see

    300, Loop is parallelizable
         Accelerator kernel generated
        300, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
    302, Loop is parallelizable
    303, Complex loop carried dependence of 'field' prevents parallelization
         Loop carried dependence of 'field' prevents parallelization
         Loop carried backward dependence of 'field' prevents vectorization
         Inner sequential loop scheduled on accelerator

PGI launches Grid=1 block=128

I don’t see any data dependency. Uncommenting “acc loop” instruction on line 299 results in

    300, Loop is parallelizable
         Accelerator kernel generated
        300, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
    302, Loop is parallelizable
    303, Loop is parallelizable

So there is no information that two inner loops were parallelized. Does not PGI trusts programmer?

In this case launch configuration is the same.

PGI 13.9 was used

WBR,
Alexey

Hi Alexey,

The problem is that “collapse” can only be applied to tightly nested loops. Hence, you’re telling the compiler to only accelerate the outer loop.

While you may need to experiment a little, given that we what “I” to be our “vector” loop and that the j loop iteration count is determined by the value of b_dist, try the following instead:

299 !$acc loop gang vector collapse(2) independent 
 300         DO i = max(its,ibe-spec_zone+1), itf 
 302           DO k = kts, ktf 
 301             b_dist = ibe - i  
 303             DO j = max(jts,b_dist+jbs+1), min(jtf,jbe-b_dist-1) 
 304               mu_old_ = muts(i,j) - dt*mu_tend(i,j) 
 305 
 306               field(i,k,j) = field(i,k,j)*mu_old_/muts(i,j) + & 
 307                    dt*field_tend(i,k,j)/muts(i,j) +               & 
 308                    ph_save(i,k,j)*(mu_old_/muts(i,j) - 1.) 
 309 
 310             ENDDO 
 311           ENDDO 
 312         ENDDO

Hope this helps,
Mat

Thank you Mat!

Will read docs more carefully.

Alexey