Procedure in derived type data

Hi,

I have a question about parallelization of nested loops inside which procedures in derived type data are used. A template of my code is formulated and shown below:

      do k = bound%idx_min(3), bound%idx_max(3)
        do j = bound%idx_min(2), bound%idx_max(2)
          do i = bound%idx_min(1), bound%idx_max(1)

            call find_bc_indicies_from_cell( [i,j,k], bound%face_label,        &
                                             bound%gc_stencil_size,            &
                                             low_idx, high_idx, dir )

            call find_bc_normal_from_cell( [i,j,k], bound%face_label, gblock,  &
                                           normal )

            call bound%fill_ghost_cells(                                       &
                   normal,                                                     &
                   sblock%rho(    low_idx(1):high_idx(1):dir(1),               &
                                  low_idx(2):high_idx(2):dir(2),               &
                                  low_idx(3):high_idx(3):dir(3) ),             &
                   sblock%vel( :, low_idx(1):high_idx(1):dir(1),               &
                                  low_idx(2):high_idx(2):dir(2),               &
                                  low_idx(3):high_idx(3):dir(3) ),             &
                   sblock%p(      low_idx(1):high_idx(1):dir(1),               &
                                  low_idx(2):high_idx(2):dir(2),               &
                                  low_idx(3):high_idx(3):dir(3) ),             &
                   sblock%temp(   low_idx(1):high_idx(1):dir(1),               &
                                  low_idx(2):high_idx(2):dir(2),               &
                                  low_idx(3):high_idx(3):dir(3) ) )

          end do
        end do
      end do

I want to parallelize this nested loop with three layers, however I found that I need to call bound%fill_ghost_cells but bound is an derived type data (declared as “class(bc_t), pointer :: bound”) and class bc_t is defined as:

  type, abstract :: bc_t

    ! Is this boundary connected to another boundary
    logical :: is_connected = .false.

    logical :: overwrites_flux = .false.
    ! True  = this BC calculates its own flux at boundary face
    ! False = this BC uses MUSCL to calculate flux at boundary face

    integer :: block_id        = -1  ! What parent block this is on
    integer :: bc_label        = -1  ! What bc type this is
    integer :: face_label      = -1  ! What face this is on

    ! Number of interior cells in the stencil to fill the GCs
    integer :: gc_stencil_size = -1

    ! Indicies of the interior cells that are along this boundary
    integer, dimension(3) :: idx_min = -1
    integer, dimension(3) :: idx_max = -1

    ! Out direction of this face
    integer, dimension(3) :: out_dir

    ! Information about neighbor
    type(nbor_info_t) :: nbor

  contains

    private

    ! Common Procedures
    procedure, public,   pass :: init_bc
    procedure, public,   pass :: fill_nbor_info
    procedure, public,   pass :: is_xi_face
    procedure, public,   pass :: is_eta_face
    procedure, public,   pass :: is_zeta_face
    procedure, public,   pass :: is_min_face
    procedure, public,   pass :: find_nbor_idx
    procedure, public,   pass :: find_my_idx
    procedure, public, nopass :: extrapolate_through_face_to_gc1
    procedure, public, nopass :: extrapolate_state_to_gc
    procedure, public, nopass :: gc_from_face_jacobians

    ! Deferred Procedures
    procedure(fill_gc_i),      public, deferred, pass :: fill_ghost_cells
    procedure(gc_jacobians_i), public, deferred, pass :: ghost_cell_jacobians

    ! Procedures which are not implemented in general but are overwritten if
    ! this%overwrites_flux
    procedure,                 public,           pass :: boundary_flux
    procedure,                 public,           pass :: boundary_flux_jacobians

  end type bc_t

  type :: bc_holder_t

    class(bc_t), allocatable :: bc

  end type

Here you can see that fill_ghost_cells is a deferred procedure in bc_t. My concern is: if I directly parallize this nested loop, would there be an segmentation fault? And, does OpenACC support calling procedures inside a derived type data? If not, then how should I tackle with this situation? I am a beginner of OpenACC, and I would greatly appreciuate any help! Thanks a lot!

The way this code is written, it will be nearly impossible to accelerate it using OpenACC. The time/registers and overhead for every thread computing the bounds, creating the structures to pass the bounds, etc. will far outweigh the amount of time each thread spends doing real work.

I’d try to simplify it on the CPU first. It will probably speed up there too.

Got it! Thanks!

Hi Brent,

Sorry that I have another question. Last time you mentioned that it would be better to simplify our CPU code first. However, our CFD code is not that easy to be modified as a guy in our lab rewrote part of the whole code using Fortran 95/03, such as derived type data. I am wondering whether OpenACC can support calling a derived type’s procedure or not. Thus, I formulated a very simple code in which boundary enforcement is done by calling a procedure inside derived data and I attached it here. I cannot compile the formulated code successfully, and I really wanted to know whether it is because OpenACC does not support derived type data or because there are mistakes here.

Thank you very much for your help!

Best,

Weicheng

Here is the code:

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! vel_derived_type is used to create a structured data type that stores the !!
 !! values of velocity at different nodes
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module vel_derived_type

   type soln_vel
     real(8), allocatable, dimension(:,:,:) :: vel, vel_r
   end type soln_vel

 contains

 !! allocate arrays
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine allocate_vel_array(imin, jmin, kmin, imax, jmax, kmax, soln)

     integer, intent(in)           :: imin, jmin, kmin
     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     allocate(soln%vel(imin:imax,jmin:jmax,kmin:kmax),                         &
              soln%vel_r(imin:imax,jmin:jmax,kmin:kmax))

   end subroutine allocate_vel_array

 !! deallocate arrays
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine deallocate_vel_array(imin, jmin, kmin, imax, jmax, kmax, soln)

     integer, intent(in)           :: imin, jmin, kmin
     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     deallocate(soln%vel(imin:imax,jmin:jmax,kmin:kmax),                       &
              soln%vel_r(imin:imax,jmin:jmax,kmin:kmax))

   end subroutine deallocate_vel_array

 end module vel_derived_type


 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! boundary_setting is used to enforce boudary conditions
 !!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module boundary_setting

   type bc_t

   ! Indicies of the cells that are along this boundary
   integer, dimension(3) :: idx_min = -1
   integer, dimension(3) :: idx_max = -1

   contains
     procedure, public, nopass :: bc_wall
   end type bc_t

 contains

 !! set wall boundary condition
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine bc_wall(imin,jmin,kmin,imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imin, jmin, kmin
     integer, intent(in) :: imax, jmax, kmax
     integer, value :: i, j
     real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_i
     real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_ir

     call linear_extrap(imin,jmin,kmin,imax,jmax,kmax,i,j,vel_i,vel_ir)

   end subroutine bc_wall

 !! linear extrapolation
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine linear_extrap(imin,jmin,kmin,imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imin,jmin,kmin
     integer, intent(in) :: imax,jmax,kmax
     integer, value :: i, j
     real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_i
     real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_ir

      vel_i(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_i(i,j,kmax)
      vel_ir(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_ir(i,j,kmax)

   end subroutine linear_extrap

 end module boundary_setting

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! this program is used to test whether OpenACC direcives support procedures
 !! in derived type data
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 program derived_procedure_test

   use vel_derived_type
   use boundary_setting

   implicit none

   type(soln_vel) :: soln
   type(bc_t)     :: boundary

   integer :: i, j, k
   integer :: imin, jmin, kmin
   integer :: imax, jmax, kmax

   continue

   imin = 1
   jmin = 1
   kmin = 1

   imax = 3
   jmax = 4
   kmax = 5

   call allocate_vel_array(imin, jmin, kmin, imax, jmax, kmax, soln)

   do k = kmin, kmax
     do j = jmin, jmax
       do i = imin, imax
         soln%vel(i,j,k) = k+j*10+i**2
         soln%vel_r(i,j,k) = 0.0D0
       enddo
     enddo
   enddo

   boundary%idx_min = [imin, jmin, kmax]
   boundary%idx_max = [imax, jmax, kmax]

   !$acc enter data create(soln)
   !$acc enter data copyin(soln%vel, soln%vel_r, boundary)

   !$acc kernels default(present) present(soln)
   !$acc loop independent
   do j = boundary%idx_min(2), boundary%idx_max(2)
     !$acc loop independent
     do i = boundary%idx_min(1), boundary%idx_max(1)
       call boundary%bc_wall(imin, jmin, kmin, imax, jmax, kmax, i, j, &
                             soln%vel, soln%vel_r)
     enddo
   enddo
   !$acc end kernels

   !$acc exit data copyout(soln%vel, soln%vel_r)
   !$acc exit data delete(soln)

   write(*,*) soln%vel(imax,2:jmax,kmax)
   write(*,*) soln%vel_r(imax,2:jmax,kmax)

   call deallocate_vel_array(imin, jmin, kmin, imax, jmax, kmax, soln)

 end program derived_procedure_test

If I do not use GPU, then it could compile and run without any issues. However, if I compile with “-acc -Minfo=accel” added, then it reports “unsupported statement type” error.

I greatly appreciate your help!

Yes, as you’ve found, we do not currently support derived-type procedures, type-bound procedures, or procedure pointers in GPU code.

Ok. Thank you very much!

Best,

Weicheng Xue

hi,
you can try the derived types in main program directly by using : !$cuf kernel do <<<,>>> instead of acc loop directive, but it’s not efficient on acceleration at all. The more complicated the derived type, the much slower it works.

Pearlriver,

Thanks for your suggestion!

Best Regards,

Weicheng