FIRSTPRIVATE and OMP offloading

Hi,

I’m trying to optimize the following code:

!$OMP TARGET TEAMS LOOP BIND(TEAMS)
    do e=1,nelt
!$OMP LOOP COLLAPSE(3) BIND(PARALLEL) PRIVATE(tmpu3,l)
     do k=1,lz1
        do j=1,ly1
          do i=1,lx1
            tmpu3 = 0.0
            do l=1,lx1
               tmpu3 = tmpu3 + dxm1(k,l)*u(i,j,l,e)
            enddo

          wr = g5m1(i,j,k,e)*tmpu3
          ws = g6m1(i,j,k,e)*tmpu3
          wt = g3m1(i,j,k,e)*tmpu3

          dudr(i,j,k,e) = (dudr(i,j,k,e) + wr) *  helm1(i,j,k,e)
          duds(i,j,k,e) = (duds(i,j,k,e) + ws) *  helm1(i,j,k,e)
          dudt(i,j,k,e) = (dudt(i,j,k,e) + wt) *  helm1(i,j,k,e)

          enddo
        enddo
      enddo
   enddo

The initialization of tmpu3 inhibit the collapsing of 4 loops togheter. So my idea is the following:

  tmpu3 = 0.0

!$OMP TARGET TEAMS LOOP BIND(TEAMS)
  do e=1,nelt
!$OMP LOOP COLLAPSE(4) BIND(PARALLEL) FIRSTPRIVATE(tmpu3,l)
      do k=1,lz1
        do j=1,ly1
          do i=1,lx1
            do l=1,lx1
              tmpu3 = tmpu3 + dxm1(k,l)*u(i,j,l,e)
            enddo

          wr = g5m1(i,j,k,e)*tmpu3
          ws = g6m1(i,j,k,e)*tmpu3
          wt = g3m1(i,j,k,e)*tmpu3

          dudr(i,j,k,e) = (dudr(i,j,k,e) + wr) *  helm1(i,j,k,e)
          duds(i,j,k,e) = (duds(i,j,k,e) + ws) *  helm1(i,j,k,e)
          dudt(i,j,k,e) = (dudt(i,j,k,e) + wt) *  helm1(i,j,k,e)

          tmpu3 = 0.0
        enddo
       enddo
      enddo
   enddo

Having:

NVFORTRAN-S-0533-Clause ‘FIRSTPRIVATE’ not allowed in OMP LOOP

Why FIRSTPRIVATE is not allowed Is there other way to collapsing all 4 loops toghether? Thanks.

Are you sure you want to collapse the 4 loops together? What is lx1 typically? Unless it is very large (> 64?) I would think you want to run the “do l” loop sequentially by every thread. You will get better access of the u array.

Hi, these are the loop dimensions:

nelt: 9120
lz1: 8
ly1: 8
lx1: 8

And apart the convenience or not, why FIRSTPRIVATE is not allowed?

I’ll have to dig into it. I am confused what your intended behavior is, and maybe the compiler is confused as well. If you collapse all 4 loops, do you want to do a reduction on tmpu3? But, there are only 3 "end do"s. What does firstprivate in such a structure even mean?