!$acc declare directive, treatment of derived types

I’ve sent the following material to Dave Norton who promised to forward it to Mat, but for the benefit of the community, it is reproduced here.

We are porting the dynamical core of the evolving ICON climate model to GPUs using OpenACC. The code consists many, many triply nested loops like the one below. The derived types are a problem for OpenACC: variables like p_diag%vn_ie, p_int%c_lin_e, p_prog%w and p_diag%e_kinh all need to be on the GPU. The first question is the time frame for supporting derived types on the GPU. If this is not going to be supported soon, major refactoring has to happen.

!$OMP DO PRIVATE(jb, jk, je, i_startidx, i_endidx)
   DO jb = i_startblk, i_endblk

     CALL get_indices_e(p_patch, jb, i_startblk, i_endblk, &
                        i_startidx, i_endidx, rl_start, rl_end)
     DO je = i_startidx, i_endidx
       DO jk = 1, nlev
         ! Multiply vn_ie with w interpolated to edges for divergence computation
         z_vnw(je,jk,jb) = p_diag%vn_ie(je,jk,jb)*                            &
          ( p_int%c_lin_e(je,1,jb) * p_prog%w(icidx(je,jb,1),jk,icblk(je,jb,1)) &
          + p_int%c_lin_e(je,2,jb) * p_prog%w(icidx(je,jb,2),jk,icblk(je,jb,2)) )

         ! Compute horizontal gradient of horizontal kinetic energy
         z_ddxn_ekin_e(je,jk,jb) = p_patch%edges%inv_dual_edge_length(je,jb) *  &
          (p_diag%e_kinh(icidx(je,jb,2),jk,icblk(je,jb,2)) -                    &
           p_diag%e_kinh(icidx(je,jb,1),jk,icblk(je,jb,1)) )
       ENDDO
     ENDDO
   ENDDO
!$OMP END DO

The second question is whether !$acc declare is currently supported, or will be soon. This feature is crucial for this code. We have 12.4 installed. According to the implementation schedule on

declare should be implemented in 12.3, but it seems unavailable:

    !$acc declare create(prog_vn(2,2,3))



PGF90-S-0070-Incorrect sequence of statements (…/…/…/src/atm_dyn_iconam/mo_nonhydro_state.f90: 800)

Could you let us know your timetable for declare?

Thanks,

–Will[/url]

Hi Will,

The first question is the time frame for supporting derived types on the GPU. If this is not going to be supported soon, major refactoring has to happen.

Derived types are supported within OpenACC. What’s not supported yet is acceleration of allocatable arrays within derived types, as is the case here. We have an open request for this feature (TPR#17199) but I’m not sure where Michael and his team are at on it. I’ll ask if he can give us an update.

The second question is whether !$acc declare is currently supported, or will be soon.

The “declare” directive was first added in 12.3 so I suspect some other issue is occurring and I’ll need to see a reproducing example. Will this be in the code you sent Dave? If so, I’ll investigate once he sends it to me. (Note that Dave is currently attending the NVIDIA GTC Conference so may not be able to send the code till next week).

Best Regards,
Mat

Will:
You are right, we’re late on the declare directive, and others. We’re working really, really hard, some of the features turned out to have larger impact than we’d thought. One comment, the list items in a declare are supposed to be either whole arrays, or sections, not array declarations.
!$acc declare create(prog_vn(2,2,3))
looks to the compiler like you want the element prog_vn(2,2,3) created. What you really want is the whole array with that shape. The PGI compiler, when it sees a directive like this with a single array element, will assume that you meant the section, but not all compilers will necessarily do that. I expect this to be a common misconception, and we may have to change the OpenACC spec to be easier in this regard.

As for derived type members, or C struct members, if there are no pointers involved, then it’s probably not too hard. We’ve been leary of implementing this because in Fortran, if you had:
type(dt),pointer :: pp, qq
!$acc copy(pp%data, qq%data)
do i = 1,n
… pp%data(i) + qq%data(i)

very bad things would happen if pp and qq happened to alias, if they pointed to the same data. The pointer aliasing is not preserved when the data is moved to the GPU. It would theoretically be possible to preserve pointer aliasing, but it would have a serious performance impact, and that would impact even programs that never used pointers or had any aliasing. Similarly with arrays of derived types:
type(dt) :: rr(10)

!$acc copy(rr(i)%data,rr(j)%data)
do k = 1, n
rr(i)%data(k) = rr(j)%data(k)…

What if ‘i’ and ‘j’ have the same value? What if they change values within the region? The ways to make this really difficult, and get strange errors, is quite large.

However, if the case you really want is simple derived type variables with member arrays, I think we can work on this.

-Michael Wolfe

Michael, Mat,

Thanks for the extensive replies.

Concerning the declare directive.

You are right, we’re late on the declare directive, and others. We’re working really, really hard, some of the features turned out to have larger impact than we’d thought. One comment, the list items in a declare are supposed to be either whole arrays, or sections, not array declarations.
!$acc declare create(prog_vn(2,2,3))
looks to the compiler like you want the element prog_vn(2,2,3) created.

Thanks for the clarifications. I tried many variants, including simply,

!$acc declare create(prog_vn)

which yields “Incorrect sequence of statements”, and even,

!$acc declare

which yields: “Syntax error at or near end of line”.

If I understand you correctly, declare is not currently (properly?) implemented. When do you think it might be available? Or, if it really is working, could you illustrate a simple example?

Concerning derived types: indeed I meant derived types containing Fortran pointers (not simply allocatable arrays), which will is a very common case in atmospheric codes. Here a real life example:

  TYPE t_nh_prog
    REAL(wp), POINTER :: &
      w(:,:,:),          & !> orthogonal vertical wind (nproma,nlevp1,nblks_c)     [m/s]
      vn(:,:,:),         & !! orthogonal normal wind (nproma,nlev,nblks_e)         [m/s]
      rho(:,:,:),        & !! density (nproma,nlev,nblks_c)                     [kg/m^3]
      exner(:,:,:),      & !! Exner pressure (nproma,nlev,nblks_c)                   [-]
      theta_v(:,:,:),    & !! virtual potential temperature (nproma,nlev,nblks_c)    [K]
      rhotheta_v(:,:,:), & !! rho*theta_v (nproma,nlev,nblks_c)               [K*kg/m^3]
      tracer(:,:,:,:),   & !! tracer concentration (nproma,nlev,nblks_c,ntracer) [kg/kg]
      tke   (:,:,:)        !! turbulent kinetic energy                         [m^2/s^2]
                           !! (defined on half levels) with 2 time levels
    TYPE(t_ptr_nh),ALLOCATABLE :: tracer_ptr(:)  !< pointer array: one pointer for each tracer
  END TYPE t_nh_prog

Once allocated, these arrays are then used in a predictable way (see original code), and do not illustrate any of the pathological behavior you mention.

However, if the case you really want is simple derived type variables with member arrays, I think we can work on this.

That’s great news. Keep this case in mind, since it is very common in these codes. But I guess it is clear that this would take a year or more to integrate into the compiler. I think the best game plan is for me to refactor the code to remove these derived type pointers from the loops. This will probably mean the arrays will be aliases to pointers in the CPU code, but will be toggled over with #ifdefs to actual device arrays in the GPU version. I’m still formulated least inelegant method in my head, and will have an example together in the next few weeks.