Incorrect output_using derived type data

Sunsincer97 · October 18, 2017, 1:21am

Hi,

I am now having a big compiler and solution issue in my CFD code. I thus formulated a simple version of my code, i.e. extrapolating the values from interior domain to ghost nodes. This simpler code can be executed but the result is incorrect. Could anyone provide any suggestions? Thanks a lot!

The code is attached:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! vel_derived_type is used to create a structured data type that stores the !!
!! values of velocity at different nodes !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module vel_derived_type

  type soln_vel
    real(8), allocatable, dimension(:,:,:) :: vel, vel_r
  end type soln_vel

contains

!! allocate arrays !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  subroutine allocate_vel_array(imax, jmax, kmax, soln)

    integer, intent(in)           :: imax, jmax, kmax
    type(soln_vel), intent(inout) :: soln

    allocate(soln%vel(1:imax,1:jmax,1:kmax), soln%vel_r(1:imax,1:jmax,1:kmax))

  end subroutine allocate_vel_array

!! deallocate arrays !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  subroutine deallocate_vel_array(imax, jmax, kmax, soln)

    integer, intent(in)           :: imax, jmax, kmax
    type(soln_vel), intent(inout) :: soln

    deallocate(soln%vel(1:imax,1:jmax,1:kmax),soln%vel_r(1:imax,1:jmax,1:kmax))

  end subroutine deallocate_vel_array

end module vel_derived_type


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! boundary_setting is used to enforce boudary conditions !!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module boundary_setting

contains

!! set upper wall boundary condition !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  subroutine set_upper_wall(imax,jmax,kmax,vel_i,vel_ir)

    !$acc routine seq
    implicit none

    integer, intent(in) :: imax, jmax, kmax
    real(8), dimension(kmax-1:kmax), intent(inout) :: vel_i
    real(8), dimension(kmax-1:kmax), intent(inout) :: vel_ir

    call linear_extrap(imax,jmax,kmax,vel_i,vel_ir)

  end subroutine set_upper_wall

!! add operation !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  subroutine linear_extrap(imax,jmax,kmax,vel_i,vel_ir)

    !$acc routine seq
    implicit none
    integer :: i

    integer, intent(in) :: imax, jmax,kmax
    real(8), dimension(kmax-1:kmax), intent(inout) :: vel_i
    real(8), dimension(kmax-1:kmax), intent(inout) :: vel_ir

     vel_i(kmax) = 2*vel_i(kmax-1) - vel_i(kmax)
     vel_ir(kmax) = 2*vel_i(kmax-1) - vel_ir(kmax)

  end subroutine linear_extrap

end module boundary_setting

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! this program is used to test whether OpenACC direcives supports data !!!!!!!
!! transfer if structured data type is used !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program derived_data_para_inout_test

  use vel_derived_type
  use boundary_setting

  implicit none

  type(soln_vel) :: soln
  integer :: i, j, k
  integer :: imax, jmax, kmax

  continue

  imax = 3
  jmax = 4
  kmax = 5

  call allocate_vel_array(imax, jmax, kmax, soln)

  do k = 1, kmax
    do j = 1, jmax
      do i = 1, imax
        soln%vel(i,j,k) = k+j*10+i**2
        soln%vel_r(i,j,k) = 0.0D0
      enddo
    enddo
  enddo

  !$acc data copy(soln, soln%vel(:,:,:), soln%vel_r(:,:,:))

  !$acc kernels
  !$acc loop independent
  do j = 1, jmax
    !$acc loop independent
    do i = 1, imax
      call set_upper_wall(imax, jmax, kmax, soln%vel(i,j,kmax-1:kmax), &
                                            soln%vel_r(i,j,kmax-1:kmax))
    enddo
  enddo
  !$acc end kernels

  !$acc end data

  write(*,*) soln%vel(imax,2:jmax,kmax)
  write(*,*) soln%vel_r(imax,2:jmax,kmax)

  call deallocate_vel_array(imax, jmax, kmax, soln)

end program derived_data_para_inout_test

The outputs are shown here:

[weich97@thermisto:Fortran_Test]$ pgfortran routine_para_inout.f90 -o routine_para_inout
[weich97@thermisto:Fortran_Test]$ ./routine_para_inout
    32.00000000000000         42.00000000000000         52.00000000000000     
    66.00000000000000         86.00000000000000         106.0000000000000     
[weich97@thermisto:Fortran_Test]$ pgfortran -acc -Minfo=accel routine_para_inout.f90 -o routine_para_inout
set_upper_wall:
     44, Generating acc routine seq
linear_extrap:
     58, Generating acc routine seq
derived_data_para_inout_test:
    106, Generating copy(soln,soln%vel(:,:,:),soln%vel_r(:,:,:))
    108, Generating copy(tmp$r(:),tmp$r2(:))
    110, Loop is parallelizable
    112, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
        110, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
        112, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
        113, !$acc loop seq
[weich97@thermisto:Fortran_Test]$ ./routine_para_inout
    32.00000000000000         32.00000000000000         32.00000000000000     
    66.00000000000000         66.00000000000000         66.00000000000000

Regards,

Weicheng

MatColgrove · October 18, 2017, 5:25pm

Hi Weicheng,

I see a couple problems. First because “copy” does a shallow copy, you shouldn’t put “soln” in a “copy” clause. This will cause problems when you try to copy back the structures since the device pointers will overwrite your host pointers causing a segv on the host. I’ve re-written the code to perform a manual deep copy to avoid this issue. Note that in PGI 17.7 we did add a new beta implicit deep copy feature which will allow you to have soln in a copy clause (and remove it’s members from the copy). So alternatively if you have access to 17.7 or later, you can try using the “-ta=tesla:deepcopy” option instead.

The second issue is that you’re reshaping an array from 3D to 1D using the non-stride 1 dimension. This causes the compiler to have to create a temporary array that can then be passed into the subroutine. You can see this in the -Minfo messages:

    116, Generating implicit copy(tmp$r(:),tmp$r2(:))

This then causes a race condition, and thus the wrong answers.

It’s a hard problem for the compiler since the “solution” would be to for the compiler to allocate on the device a temp array for each inner loop iteration, copy the data to the temp array, pass in the temp array, and then copy back the results into the 3D array. However, this would absolutely kill the performance of your loop so not something we really want to do.

This is something we’re debating about internally. We want users to be able to port their code more easily, but know if we add this support, it will hurt performance and wont be obvious as to why. So for now, we’re encouraging users to avoid passing arrays that need to be reshaped. For example, pass in the entire array as I have done below. Not ideal, but for performance, it’s something you’ll want to do anyway.

Hope this helps,
Mat

% cat test.F90
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! vel_derived_type is used to create a structured data type that stores the !!
 !! values of velocity at different nodes
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module vel_derived_type

   type soln_vel
     real(8), allocatable, dimension(:,:,:) :: vel, vel_r
   end type soln_vel

 contains

 !! allocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine allocate_vel_array(imax, jmax, kmax, soln)

     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     allocate(soln%vel(1:imax,1:jmax,1:kmax), soln%vel_r(1:imax,1:jmax,1:kmax))

   end subroutine allocate_vel_array

 !! deallocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine deallocate_vel_array(imax, jmax, kmax, soln)

     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     deallocate(soln%vel(1:imax,1:jmax,1:kmax),soln%vel_r(1:imax,1:jmax,1:kmax))

   end subroutine deallocate_vel_array

 end module vel_derived_type


 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! boundary_setting is used to enforce boudary conditions
!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module boundary_setting

 contains

 !! set upper wall boundary condition
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine set_upper_wall(imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imax, jmax, kmax
     integer, value :: i, j
     real(8), dimension(:,:,:), intent(inout) :: vel_i
     real(8), dimension(:,:,:), intent(inout) :: vel_ir

     call linear_extrap(imax,jmax,kmax,i,j,vel_i,vel_ir)

   end subroutine set_upper_wall

 !! add operation
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine linear_extrap(imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imax, jmax,kmax
     integer, value :: i, j
     real(8), dimension(:,:,:), intent(inout) :: vel_i
     real(8), dimension(:,:,:), intent(inout) :: vel_ir

      vel_i(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_i(i,j,kmax)
      vel_ir(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_ir(i,j,kmax)

   end subroutine linear_extrap

 end module boundary_setting

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! this program is used to test whether OpenACC direcives supports data
!!!!!!!
 !! transfer if structured data type is used
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 program derived_data_para_inout_test

   use vel_derived_type
   use boundary_setting

   implicit none

   type(soln_vel) :: soln
   integer :: i, j, k
   integer :: imax, jmax, kmax

   continue

   imax = 3
   jmax = 4
   kmax = 5

   call allocate_vel_array(imax, jmax, kmax, soln)

   do k = 1, kmax
     do j = 1, jmax
       do i = 1, imax
         soln%vel(i,j,k) = k+j*10+i**2
         soln%vel_r(i,j,k) = 0.0D0
       enddo
     enddo
   enddo

   !$acc enter data create(soln)
   !$acc enter data copyin(soln%vel(:,:,:), soln%vel_r(:,:,:))

   !$acc kernels present(soln)
   !$acc loop independent !collapse(2)
   do j = 1, jmax
     !$acc loop independent
     do i = 1, imax
       call set_upper_wall(imax, jmax, kmax, i, j, &
        soln%vel, soln%vel_r)
     enddo
   enddo
   !$acc end kernels

   !$acc exit data copyout(soln%vel(:,:,:), soln%vel_r(:,:,:))
   !$acc exit data delete(soln)

   write(*,*) soln%vel(imax,2:jmax,kmax)
   write(*,*) soln%vel_r(imax,2:jmax,kmax)

   call deallocate_vel_array(imax, jmax, kmax, soln)

 end program derived_data_para_inout_test
% pgfortran -V17.4 -ta=tesla:cc60 -Minfo=accel -fast test.F90
set_upper_wall:
     49, Generating acc routine seq
         Generating Tesla code
linear_extrap:
     65, Generating acc routine seq
         Generating Tesla code
derived_data_para_inout_test:
    115, Generating enter data create(soln)
    116, Generating enter data copyin(soln%vel_r(:,:,:),soln%vel(:,:,:))
    118, Generating present(soln)
    120, Loop is parallelizable
    122, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
        120, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
        122, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
    129, Generating exit data copyout(soln%vel(:,:,:),soln%vel_r(:,:,:))
    130, Generating exit data delete(soln)
% a.out
    32.00000000000000         42.00000000000000         52.00000000000000
    66.00000000000000         86.00000000000000         106.0000000000000

Sunsincer97 · October 18, 2017, 6:40pm

mkcolg:

Hi Weicheng,

I see a couple problems. First because “copy” does a shallow copy, you shouldn’t put “soln” in a “copy” clause. This will cause problems when you try to copy back the structures since the device pointers will overwrite your host pointers causing a segv on the host. I’ve re-written the code to perform a manual deep copy to avoid this issue. Note that in PGI 17.7 we did add a new beta implicit deep copy feature which will allow you to have soln in a copy clause (and remove it’s members from the copy). So alternatively if you have access to 17.7 or later, you can try using the “-ta=tesla:deepcopy” option instead.

The second issue is that you’re reshaping an array from 3D to 1D using the non-stride 1 dimension. This causes the compiler to have to create a temporary array that can then be passed into the subroutine. You can see this in the -Minfo messages:

    116, Generating implicit copy(tmp$r(:),tmp$r2(:))

This then causes a race condition, and thus the wrong answers.

It’s a hard problem for the compiler since the “solution” would be to for the compiler to allocate on the device a temp array for each inner loop iteration, copy the data to the temp array, pass in the temp array, and then copy back the results into the 3D array. However, this would absolutely kill the performance of your loop so not something we really want to do.

This is something we’re debating about internally. We want users to be able to port their code more easily, but know if we add this support, it will hurt performance and wont be obvious as to why. So for now, we’re encouraging users to avoid passing arrays that need to be reshaped. For example, pass in the entire array as I have done below. Not ideal, but for performance, it’s something you’ll want to do anyway.

Hope this helps,
Mat

% cat test.F90
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! vel_derived_type is used to create a structured data type that stores the !!
 !! values of velocity at different nodes
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module vel_derived_type

   type soln_vel
     real(8), allocatable, dimension(:,:,:) :: vel, vel_r
   end type soln_vel

 contains

 !! allocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine allocate_vel_array(imax, jmax, kmax, soln)

     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     allocate(soln%vel(1:imax,1:jmax,1:kmax), soln%vel_r(1:imax,1:jmax,1:kmax))

   end subroutine allocate_vel_array

 !! deallocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine deallocate_vel_array(imax, jmax, kmax, soln)

     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     deallocate(soln%vel(1:imax,1:jmax,1:kmax),soln%vel_r(1:imax,1:jmax,1:kmax))

   end subroutine deallocate_vel_array

 end module vel_derived_type


 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! boundary_setting is used to enforce boudary conditions
!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module boundary_setting

 contains

 !! set upper wall boundary condition
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine set_upper_wall(imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imax, jmax, kmax
     integer, value :: i, j
     real(8), dimension(:,:,:), intent(inout) :: vel_i
     real(8), dimension(:,:,:), intent(inout) :: vel_ir

     call linear_extrap(imax,jmax,kmax,i,j,vel_i,vel_ir)

   end subroutine set_upper_wall

 !! add operation
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine linear_extrap(imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imax, jmax,kmax
     integer, value :: i, j
     real(8), dimension(:,:,:), intent(inout) :: vel_i
     real(8), dimension(:,:,:), intent(inout) :: vel_ir

      vel_i(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_i(i,j,kmax)
      vel_ir(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_ir(i,j,kmax)

   end subroutine linear_extrap

 end module boundary_setting

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! this program is used to test whether OpenACC direcives supports data
!!!!!!!
 !! transfer if structured data type is used
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 program derived_data_para_inout_test

   use vel_derived_type
   use boundary_setting

   implicit none

   type(soln_vel) :: soln
   integer :: i, j, k
   integer :: imax, jmax, kmax

   continue

   imax = 3
   jmax = 4
   kmax = 5

   call allocate_vel_array(imax, jmax, kmax, soln)

   do k = 1, kmax
     do j = 1, jmax
       do i = 1, imax
         soln%vel(i,j,k) = k+j*10+i**2
         soln%vel_r(i,j,k) = 0.0D0
       enddo
     enddo
   enddo

   !$acc enter data create(soln)
   !$acc enter data copyin(soln%vel(:,:,:), soln%vel_r(:,:,:))

   !$acc kernels present(soln)
   !$acc loop independent !collapse(2)
   do j = 1, jmax
     !$acc loop independent
     do i = 1, imax
       call set_upper_wall(imax, jmax, kmax, i, j, &
        soln%vel, soln%vel_r)
     enddo
   enddo
   !$acc end kernels

   !$acc exit data copyout(soln%vel(:,:,:), soln%vel_r(:,:,:))
   !$acc exit data delete(soln)

   write(*,*) soln%vel(imax,2:jmax,kmax)
   write(*,*) soln%vel_r(imax,2:jmax,kmax)

   call deallocate_vel_array(imax, jmax, kmax, soln)

 end program derived_data_para_inout_test
% pgfortran -V17.4 -ta=tesla:cc60 -Minfo=accel -fast test.F90
set_upper_wall:
     49, Generating acc routine seq
         Generating Tesla code
linear_extrap:
     65, Generating acc routine seq
         Generating Tesla code
derived_data_para_inout_test:
    115, Generating enter data create(soln)
    116, Generating enter data copyin(soln%vel_r(:,:,:),soln%vel(:,:,:))
    118, Generating present(soln)
    120, Loop is parallelizable
    122, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
        120, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
        122, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
    129, Generating exit data copyout(soln%vel(:,:,:),soln%vel_r(:,:,:))
    130, Generating exit data delete(soln)
% a.out
    32.00000000000000         42.00000000000000         52.00000000000000
    66.00000000000000         86.00000000000000         106.0000000000000

Hi Mat,

Got it! This is really a big help!! Thank you so much!

Regards,

Weicheng

Sunsincer97 · October 30, 2017, 7:54pm

mkcolg:

Hi Weicheng,

I see a couple problems. First because “copy” does a shallow copy, you shouldn’t put “soln” in a “copy” clause. This will cause problems when you try to copy back the structures since the device pointers will overwrite your host pointers causing a segv on the host. I’ve re-written the code to perform a manual deep copy to avoid this issue. Note that in PGI 17.7 we did add a new beta implicit deep copy feature which will allow you to have soln in a copy clause (and remove it’s members from the copy). So alternatively if you have access to 17.7 or later, you can try using the “-ta=tesla:deepcopy” option instead.

The second issue is that you’re reshaping an array from 3D to 1D using the non-stride 1 dimension. This causes the compiler to have to create a temporary array that can then be passed into the subroutine. You can see this in the -Minfo messages:

    116, Generating implicit copy(tmp$r(:),tmp$r2(:))

This then causes a race condition, and thus the wrong answers.

It’s a hard problem for the compiler since the “solution” would be to for the compiler to allocate on the device a temp array for each inner loop iteration, copy the data to the temp array, pass in the temp array, and then copy back the results into the 3D array. However, this would absolutely kill the performance of your loop so not something we really want to do.

This is something we’re debating about internally. We want users to be able to port their code more easily, but know if we add this support, it will hurt performance and wont be obvious as to why. So for now, we’re encouraging users to avoid passing arrays that need to be reshaped. For example, pass in the entire array as I have done below. Not ideal, but for performance, it’s something you’ll want to do anyway.

Hope this helps,
Mat

% cat test.F90
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! vel_derived_type is used to create a structured data type that stores the !!
 !! values of velocity at different nodes
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module vel_derived_type

   type soln_vel
     real(8), allocatable, dimension(:,:,:) :: vel, vel_r
   end type soln_vel

 contains

 !! allocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine allocate_vel_array(imax, jmax, kmax, soln)

     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     allocate(soln%vel(1:imax,1:jmax,1:kmax), soln%vel_r(1:imax,1:jmax,1:kmax))

   end subroutine allocate_vel_array

 !! deallocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine deallocate_vel_array(imax, jmax, kmax, soln)

     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     deallocate(soln%vel(1:imax,1:jmax,1:kmax),soln%vel_r(1:imax,1:jmax,1:kmax))

   end subroutine deallocate_vel_array

 end module vel_derived_type


 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! boundary_setting is used to enforce boudary conditions
!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module boundary_setting

 contains

 !! set upper wall boundary condition
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine set_upper_wall(imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imax, jmax, kmax
     integer, value :: i, j
     real(8), dimension(:,:,:), intent(inout) :: vel_i
     real(8), dimension(:,:,:), intent(inout) :: vel_ir

     call linear_extrap(imax,jmax,kmax,i,j,vel_i,vel_ir)

   end subroutine set_upper_wall

 !! add operation
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine linear_extrap(imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imax, jmax,kmax
     integer, value :: i, j
     real(8), dimension(:,:,:), intent(inout) :: vel_i
     real(8), dimension(:,:,:), intent(inout) :: vel_ir

      vel_i(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_i(i,j,kmax)
      vel_ir(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_ir(i,j,kmax)

   end subroutine linear_extrap

 end module boundary_setting

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! this program is used to test whether OpenACC direcives supports data
!!!!!!!
 !! transfer if structured data type is used
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 program derived_data_para_inout_test

   use vel_derived_type
   use boundary_setting

   implicit none

   type(soln_vel) :: soln
   integer :: i, j, k
   integer :: imax, jmax, kmax

   continue

   imax = 3
   jmax = 4
   kmax = 5

   call allocate_vel_array(imax, jmax, kmax, soln)

   do k = 1, kmax
     do j = 1, jmax
       do i = 1, imax
         soln%vel(i,j,k) = k+j*10+i**2
         soln%vel_r(i,j,k) = 0.0D0
       enddo
     enddo
   enddo

   !$acc enter data create(soln)
   !$acc enter data copyin(soln%vel(:,:,:), soln%vel_r(:,:,:))

   !$acc kernels present(soln)
   !$acc loop independent !collapse(2)
   do j = 1, jmax
     !$acc loop independent
     do i = 1, imax
       call set_upper_wall(imax, jmax, kmax, i, j, &
        soln%vel, soln%vel_r)
     enddo
   enddo
   !$acc end kernels

   !$acc exit data copyout(soln%vel(:,:,:), soln%vel_r(:,:,:))
   !$acc exit data delete(soln)

   write(*,*) soln%vel(imax,2:jmax,kmax)
   write(*,*) soln%vel_r(imax,2:jmax,kmax)

   call deallocate_vel_array(imax, jmax, kmax, soln)

 end program derived_data_para_inout_test
% pgfortran -V17.4 -ta=tesla:cc60 -Minfo=accel -fast test.F90
set_upper_wall:
     49, Generating acc routine seq
         Generating Tesla code
linear_extrap:
     65, Generating acc routine seq
         Generating Tesla code
derived_data_para_inout_test:
    115, Generating enter data create(soln)
    116, Generating enter data copyin(soln%vel_r(:,:,:),soln%vel(:,:,:))
    118, Generating present(soln)
    120, Loop is parallelizable
    122, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
        120, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
        122, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
    129, Generating exit data copyout(soln%vel(:,:,:),soln%vel_r(:,:,:))
    130, Generating exit data delete(soln)
% a.out
    32.00000000000000         42.00000000000000         52.00000000000000
    66.00000000000000         86.00000000000000         106.0000000000000

Hi Mat,

I found a new problem. I am now working on optimizing my CFD code which has ghost cells, and it seems that the solution is always incorrect, potentially due to indexing error. Therefore I modified the simple code (which I used previously) by changing:

imax = 3
   jmax = 4
   kmax = 5

to the new one:

imin  = -1
   jmin  = -1
   kmin = -1

   imax = 3
   jmax = 4
   kmax = 5

Now I get the following code:

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! vel_derived_type is used to create a structured data type that stores the !!
 !! values of velocity at different nodes
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module vel_derived_type

   type soln_vel
     real(8), allocatable, dimension(:,:,:) :: vel, vel_r
   end type soln_vel

 contains

 !! allocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine allocate_vel_array(imin, jmin, kmin, imax, jmax, kmax, soln)

     integer, intent(in)           :: imin, jmin, kmin
     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     allocate(soln%vel(imin:imax,jmin:jmax,kmin:kmax),                         &
              soln%vel_r(imin:imax,jmin:jmax,kmin:kmax))

   end subroutine allocate_vel_array

 !! deallocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine deallocate_vel_array(imin, jmin, kmin, imax, jmax, kmax, soln)

     integer, intent(in)           :: imin, jmin, kmin
     integer, intent(in)           :: imax, jmax, kmax
     type(soln_vel), intent(inout) :: soln

     deallocate(soln%vel(imin:imax,jmin:jmax,kmin:kmax),                       &
              soln%vel_r(imin:imax,jmin:jmax,kmin:kmax))

   end subroutine deallocate_vel_array

 end module vel_derived_type


 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! boundary_setting is used to enforce boudary conditions
!!!!!!!!!!!!!!!!!!!!!
 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 module boundary_setting

 contains

 !! set upper wall boundary condition
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine set_upper_wall(imin,jmin,kmin,imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imin, jmin, kmin
     integer, intent(in) :: imax, jmax, kmax
     integer, value :: i, j
     !real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_i
     !real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_ir
     real(8), dimension(:,:,:), intent(inout) :: vel_i
     real(8), dimension(:,:,:), intent(inout) :: vel_ir

     call linear_extrap(imin,jmin,kmin,imax,jmax,kmax,i,j,vel_i,vel_ir)

   end subroutine set_upper_wall

 !! add operation
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   subroutine linear_extrap(imin,jmin,kmin,imax,jmax,kmax,i,j,vel_i,vel_ir)

     !$acc routine seq
     implicit none

     integer, intent(in) :: imin,jmin,kmin,imax,jmax,kmax
     integer, value :: i, j
     !real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_i
     !real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_ir
     real(8), dimension(:,:,:), intent(inout) :: vel_i
     real(8), dimension(:,:,:), intent(inout) :: vel_ir

      vel_i(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_i(i,j,kmax)
      vel_ir(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_ir(i,j,kmax)

   end subroutine linear_extrap

 end module boundary_setting

 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 !! this program is used to test whether OpenACC direcives supports data
!!!!!!!
 !! transfer if structured data type is used
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 program derived_data_para_inout_test

   use vel_derived_type
   use boundary_setting

   implicit none

   type(soln_vel) :: soln
   integer :: i, j, k
   integer :: imin, jmin, kmin
   integer :: imax, jmax, kmax

   continue

   imin = -1
   jmin = -1
   kmin = -1

   imax = 3
   jmax = 4
   kmax = 5

   call allocate_vel_array(imin, jmin, kmin, imax, jmax, kmax, soln)

   do k = kmin, kmax
     do j = jmin, jmax
       do i = imin, imax
         soln%vel(i,j,k) = k+j*10+i**2
         soln%vel_r(i,j,k) = 0.0D0
       enddo
     enddo
   enddo

   !$acc enter data create(soln)
   !$acc enter data copyin(soln%vel(:,:,:), soln%vel_r(:,:,:))

   !$acc kernels present(soln)
   !$acc loop independent !collapse(2)
   do j = 1, jmax
     !$acc loop independent
     do i = 1, imax
       call set_upper_wall(imin, jmin, kmin, imax, jmax, kmax, i, j, &
                           soln%vel, soln%vel_r)
     enddo
   enddo
   !$acc end kernels

   !$acc exit data copyout(soln%vel(:,:,:), soln%vel_r(:,:,:))
   !$acc exit data delete(soln)

   write(*,*) soln%vel(imax,2:jmax,kmax)
   write(*,*) soln%vel_r(imax,2:jmax,kmax)

   call deallocate_vel_array(imin, jmin, kmin, imax, jmax, kmax, soln)

 end program derived_data_para_inout_test

I hoped to see the same output given by the former code:

    32.00000000000000         42.00000000000000         52.00000000000000     
    66.00000000000000         86.00000000000000         106.0000000000000

However, when Icompiled and ran, I got incorrect result:

[weich97@thermisto:Fortran_Test]$ pgfortran -acc -Minfo=accel derived_data_para_inout_test.f90 -o derived_data_para_inout_test
set_upper_wall:
     53, Generating acc routine seq
linear_extrap:
     72, Generating acc routine seq
derived_data_para_inout_test:
    129, Generating enter data create(soln)
    130, Generating enter data copyin(soln%vel(:,:,:),soln%vel_r(:,:,:))
    132, Generating present(soln)
    134, Loop is parallelizable
    136, Loop is parallelizable
         Accelerator kernel generated
         Generating Tesla code
        134, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
        136, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
    143, Generating exit data copyout(soln%vel(:,:,:),soln%vel_r(:,:,:))
    144, Generating exit data delete(soln)
[weich97@thermisto:Fortran_Test]$ ./derived_data_para_inout_test     34.00000000000000         44.00000000000000         54.00000000000000     
    0.000000000000000         0.000000000000000         0.000000000000000

Since in my CFD code, ghost cells are used (so the index does not starts from 1, it starts from a negative value ), I need to make sure there is a way to pass data correctly into routines, when OpenACC directives are used. Could you please have a check and provide some suggestions for me? I would greatly appreciate your help!

Best Regards,

Weicheng Xue

MatColgrove · October 31, 2017, 9:52pm

Hi Weicheng Xue,

The error is in your Fortran code and will fail with our without the OpenACC directives enabled.

The problem is that you’re passing in the arrays as assumed shape (i.e. “dimension(:,:,:)”) in which case the lower bound is presumed to be 1. If you instead use the explicit bounds or map your indices (i.e. -1 becomes 1, 0 becomes 2, etc.), then it should work.

I just uncommented out the two spots where you use the explicit bounds and your code then worked as expected.

     real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_i 
     real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_ir
     !!! real(8), dimension(:,:,:), intent(inout) :: vel_i 
     !!! real(8), dimension(:,:,:), intent(inout) :: vel_ir

-Mat

Sunsincer97 · October 31, 2017, 10:40pm

mkcolg:

Hi Weicheng Xue,

The error is in your Fortran code and will fail with our without the OpenACC directives enabled.

The problem is that you’re passing in the arrays as assumed shape (i.e. “dimension(:,:,:)”) in which case the lower bound is presumed to be 1. If you instead use the explicit bounds or map your indices (i.e. -1 becomes 1, 0 becomes 2, etc.), then it should work.

I just uncommented out the two spots where you use the explicit bounds and your code then worked as expected.
     real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_i 
     real(8), dimension(imin:imax,jmin:jmax,kmin:kmax), intent(inout) :: vel_ir
     !!! real(8), dimension(:,:,:), intent(inout) :: vel_i 
     !!! real(8), dimension(:,:,:), intent(inout) :: vel_ir
-Mat

Thanks Mat! It does work. I made a mistake there.

Best Regards,

Weicheng Xue

Topic		Replies	Views
Procedure in derived type data Legacy PGI Compilers	7	4323	December 30, 2018
Using Fortran derived types and cuBLAS Legacy PGI Compilers	19	12047	June 24, 2016
Need advices for optimizing heart of CFD code Legacy PGI Compilers	11	7065	July 13, 2016
Call in OpenACC region to procedure 'pgf90_copy_f90_argl' Legacy PGI Compilers	10	11401	July 5, 2017
copy(movement) of user defined objects to the gpu in OpenACC Legacy PGI Compilers	22	22493	December 10, 2018
Clarification on using OpenACC in a shared library Legacy PGI Compilers	27	4495	December 9, 2020
error 700: Illegal address during kernel execution Legacy PGI Compilers	6	5007	April 16, 2019
Accelerator clause: upper bound for dimension 0 of array Legacy PGI Compilers	7	4093	May 24, 2018
Vector array assignments within a $acc parallel region Legacy PGI Compilers	13	10948	November 27, 2013
Operators both on host and device functions Legacy PGI Compilers	21	10646	October 12, 2010

Incorrect output_using derived type data

Related topics