OpenACC on nested loops

Hi all,

I am able to compile the Fortran code snippet below with OpenACC compiler directives:

!$acc parallel loop reduction(+:potential_dot_dot_acoustic) private(isource,iglob,ispec,i,j,k)
					DO isource = 1,NSOURCES
                        ispec = ispec_selected_source(isource)
                        IF (ispec_is_inner(ispec) .eqv. phase_is_inner) THEN
                            DO k=1,NGLLZ
                            DO j=1,NGLLY
                            DO i=1,NGLLX
                                iglob = ibool(i,j,k,ispec)
                                potential_dot_dot_acoustic(iglob) = potential_dot_dot_acoustic(iglob) - sourcearrays(isource,1,i,j,k) * stf_pre_compute(isource) / kappastore(i,j,k,ispec)
                            END DO
                            END DO
                            END DO
                        END IF ! ispec_is_inner
					END DO ! NSOURCES
!$acc end parallel

However, I am unable to parallelize all the loops. The nvfortran compiler yields the following messages:

     97, Generating implicit copyin(kappastore(:,:,:,:)) [if not already present]
         Generating implicit firstprivate(nsources)
         Generating NVIDIA GPU code
         98, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             Generating reduction(+:potential_dot_dot_acoustic(:))
        101, !$acc loop seq
        102, !$acc loop seq
        103, !$acc loop seq
     97, Generating implicit copyin(ibool(:,:,:,:),ispec_is_inner(:)) [if not already present]
         Generating copy(potential_dot_dot_acoustic(:)) [if not already present]
         Generating implicit copyin(stf_pre_compute(:nsources),sourcearrays(:nsources,:1,:,:,:),ispec_selected_source(:nsources)) [if not already present]
     98, Generating implicit firstprivate(phase_is_inner)
    101, Loop carried dependence of potential_dot_dot_acoustic prevents parallelization
         Loop carried backward dependence of potential_dot_dot_acoustic prevents vectorization
    102, Loop carried dependence of potential_dot_dot_acoustic prevents parallelization
         Loop carried backward dependence of potential_dot_dot_acoustic prevents vectorization
    103, Complex loop carried dependence of potential_dot_dot_acoustic prevents parallelization
         Loop carried dependence of potential_dot_dot_acoustic prevents parallelization
         Loop carried backward dependence of potential_dot_dot_acoustic prevents vectorization

Does anyone have any suggestions on more efficient parallelization?

Thanks,
Jyoti

Hi Jyoti,

Since “potential_dot_dot_acoustic” uses a look-up array to get the index, the compiler can’t auto-parallelize these loops. It has to assume that iglob contains duplicate indices which would cause race conditions.

If you know all iglob values are unique, you can add the “loop” directives to the inner do loops to parallelize them. If the values are not unique, you can still parallelize the loops, but you will want to also add an “atomic” clause. Something like:

!$acc parallel loop reduction(+:potential_dot_dot_acoustic) private(isource,iglob,ispec,i,j,k)
					DO isource = 1,NSOURCES
                        ispec = ispec_selected_source(isource)
                        IF (ispec_is_inner(ispec) .eqv. phase_is_inner) THEN
!$acc loop collapse(3)
                            DO k=1,NGLLZ
                            DO j=1,NGLLY
                            DO i=1,NGLLX
                                iglob = ibool(i,j,k,ispec)
! Optionally add an atomic update if iglob values are not unique 
! to avoid race conditions
!$acc atomic update
                                potential_dot_dot_acoustic(iglob) = potential_dot_dot_acoustic(iglob) - sourcearrays(isource,1,i,j,k) * stf_pre_compute(isource) / kappastore(i,j,k,ispec)
                            END DO
                            END DO
                            END DO
                        END IF ! ispec_is_inner
					END DO ! NSOURCES
!$acc end parallel

-Mat

Hi Mat,

Great! Appreciate it.

Cheers,
Jyoti

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.