I am now having a big compiler and solution issue in my CFD code. I thus formulated a simple version of my code, i.e. extrapolating the values from interior domain to ghost nodes. This simpler code can be executed but the result is incorrect. Could anyone provide any suggestions? Thanks a lot!
The code is attached:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! vel_derived_type is used to create a structured data type that stores the !!
!! values of velocity at different nodes !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module vel_derived_type
type soln_vel
real(8), allocatable, dimension(:,:,:) :: vel, vel_r
end type soln_vel
contains
!! allocate arrays !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine allocate_vel_array(imax, jmax, kmax, soln)
integer, intent(in) :: imax, jmax, kmax
type(soln_vel), intent(inout) :: soln
allocate(soln%vel(1:imax,1:jmax,1:kmax), soln%vel_r(1:imax,1:jmax,1:kmax))
end subroutine allocate_vel_array
!! deallocate arrays !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine deallocate_vel_array(imax, jmax, kmax, soln)
integer, intent(in) :: imax, jmax, kmax
type(soln_vel), intent(inout) :: soln
deallocate(soln%vel(1:imax,1:jmax,1:kmax),soln%vel_r(1:imax,1:jmax,1:kmax))
end subroutine deallocate_vel_array
end module vel_derived_type
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! boundary_setting is used to enforce boudary conditions !!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module boundary_setting
contains
!! set upper wall boundary condition !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine set_upper_wall(imax,jmax,kmax,vel_i,vel_ir)
!$acc routine seq
implicit none
integer, intent(in) :: imax, jmax, kmax
real(8), dimension(kmax-1:kmax), intent(inout) :: vel_i
real(8), dimension(kmax-1:kmax), intent(inout) :: vel_ir
call linear_extrap(imax,jmax,kmax,vel_i,vel_ir)
end subroutine set_upper_wall
!! add operation !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine linear_extrap(imax,jmax,kmax,vel_i,vel_ir)
!$acc routine seq
implicit none
integer :: i
integer, intent(in) :: imax, jmax,kmax
real(8), dimension(kmax-1:kmax), intent(inout) :: vel_i
real(8), dimension(kmax-1:kmax), intent(inout) :: vel_ir
vel_i(kmax) = 2*vel_i(kmax-1) - vel_i(kmax)
vel_ir(kmax) = 2*vel_i(kmax-1) - vel_ir(kmax)
end subroutine linear_extrap
end module boundary_setting
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! this program is used to test whether OpenACC direcives supports data !!!!!!!
!! transfer if structured data type is used !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program derived_data_para_inout_test
use vel_derived_type
use boundary_setting
implicit none
type(soln_vel) :: soln
integer :: i, j, k
integer :: imax, jmax, kmax
continue
imax = 3
jmax = 4
kmax = 5
call allocate_vel_array(imax, jmax, kmax, soln)
do k = 1, kmax
do j = 1, jmax
do i = 1, imax
soln%vel(i,j,k) = k+j*10+i**2
soln%vel_r(i,j,k) = 0.0D0
enddo
enddo
enddo
!$acc data copy(soln, soln%vel(:,:,:), soln%vel_r(:,:,:))
!$acc kernels
!$acc loop independent
do j = 1, jmax
!$acc loop independent
do i = 1, imax
call set_upper_wall(imax, jmax, kmax, soln%vel(i,j,kmax-1:kmax), &
soln%vel_r(i,j,kmax-1:kmax))
enddo
enddo
!$acc end kernels
!$acc end data
write(*,*) soln%vel(imax,2:jmax,kmax)
write(*,*) soln%vel_r(imax,2:jmax,kmax)
call deallocate_vel_array(imax, jmax, kmax, soln)
end program derived_data_para_inout_test
I see a couple problems. First because “copy” does a shallow copy, you shouldn’t put “soln” in a “copy” clause. This will cause problems when you try to copy back the structures since the device pointers will overwrite your host pointers causing a segv on the host. I’ve re-written the code to perform a manual deep copy to avoid this issue. Note that in PGI 17.7 we did add a new beta implicit deep copy feature which will allow you to have soln in a copy clause (and remove it’s members from the copy). So alternatively if you have access to 17.7 or later, you can try using the “-ta=tesla:deepcopy” option instead.
The second issue is that you’re reshaping an array from 3D to 1D using the non-stride 1 dimension. This causes the compiler to have to create a temporary array that can then be passed into the subroutine. You can see this in the -Minfo messages:
116, Generating implicit copy(tmp$r(:),tmp$r2(:))
This then causes a race condition, and thus the wrong answers.
It’s a hard problem for the compiler since the “solution” would be to for the compiler to allocate on the device a temp array for each inner loop iteration, copy the data to the temp array, pass in the temp array, and then copy back the results into the 3D array. However, this would absolutely kill the performance of your loop so not something we really want to do.
This is something we’re debating about internally. We want users to be able to port their code more easily, but know if we add this support, it will hurt performance and wont be obvious as to why. So for now, we’re encouraging users to avoid passing arrays that need to be reshaped. For example, pass in the entire array as I have done below. Not ideal, but for performance, it’s something you’ll want to do anyway.
Hope this helps,
Mat
% cat test.F90
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! vel_derived_type is used to create a structured data type that stores the !!
!! values of velocity at different nodes
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module vel_derived_type
type soln_vel
real(8), allocatable, dimension(:,:,:) :: vel, vel_r
end type soln_vel
contains
!! allocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine allocate_vel_array(imax, jmax, kmax, soln)
integer, intent(in) :: imax, jmax, kmax
type(soln_vel), intent(inout) :: soln
allocate(soln%vel(1:imax,1:jmax,1:kmax), soln%vel_r(1:imax,1:jmax,1:kmax))
end subroutine allocate_vel_array
!! deallocate arrays
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine deallocate_vel_array(imax, jmax, kmax, soln)
integer, intent(in) :: imax, jmax, kmax
type(soln_vel), intent(inout) :: soln
deallocate(soln%vel(1:imax,1:jmax,1:kmax),soln%vel_r(1:imax,1:jmax,1:kmax))
end subroutine deallocate_vel_array
end module vel_derived_type
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! boundary_setting is used to enforce boudary conditions
!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
module boundary_setting
contains
!! set upper wall boundary condition
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine set_upper_wall(imax,jmax,kmax,i,j,vel_i,vel_ir)
!$acc routine seq
implicit none
integer, intent(in) :: imax, jmax, kmax
integer, value :: i, j
real(8), dimension(:,:,:), intent(inout) :: vel_i
real(8), dimension(:,:,:), intent(inout) :: vel_ir
call linear_extrap(imax,jmax,kmax,i,j,vel_i,vel_ir)
end subroutine set_upper_wall
!! add operation
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
subroutine linear_extrap(imax,jmax,kmax,i,j,vel_i,vel_ir)
!$acc routine seq
implicit none
integer, intent(in) :: imax, jmax,kmax
integer, value :: i, j
real(8), dimension(:,:,:), intent(inout) :: vel_i
real(8), dimension(:,:,:), intent(inout) :: vel_ir
vel_i(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_i(i,j,kmax)
vel_ir(i,j,kmax) = 2*vel_i(i,j,kmax-1) - vel_ir(i,j,kmax)
end subroutine linear_extrap
end module boundary_setting
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! this program is used to test whether OpenACC direcives supports data
!!!!!!!
!! transfer if structured data type is used
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
program derived_data_para_inout_test
use vel_derived_type
use boundary_setting
implicit none
type(soln_vel) :: soln
integer :: i, j, k
integer :: imax, jmax, kmax
continue
imax = 3
jmax = 4
kmax = 5
call allocate_vel_array(imax, jmax, kmax, soln)
do k = 1, kmax
do j = 1, jmax
do i = 1, imax
soln%vel(i,j,k) = k+j*10+i**2
soln%vel_r(i,j,k) = 0.0D0
enddo
enddo
enddo
!$acc enter data create(soln)
!$acc enter data copyin(soln%vel(:,:,:), soln%vel_r(:,:,:))
!$acc kernels present(soln)
!$acc loop independent !collapse(2)
do j = 1, jmax
!$acc loop independent
do i = 1, imax
call set_upper_wall(imax, jmax, kmax, i, j, &
soln%vel, soln%vel_r)
enddo
enddo
!$acc end kernels
!$acc exit data copyout(soln%vel(:,:,:), soln%vel_r(:,:,:))
!$acc exit data delete(soln)
write(*,*) soln%vel(imax,2:jmax,kmax)
write(*,*) soln%vel_r(imax,2:jmax,kmax)
call deallocate_vel_array(imax, jmax, kmax, soln)
end program derived_data_para_inout_test
% pgfortran -V17.4 -ta=tesla:cc60 -Minfo=accel -fast test.F90
set_upper_wall:
49, Generating acc routine seq
Generating Tesla code
linear_extrap:
65, Generating acc routine seq
Generating Tesla code
derived_data_para_inout_test:
115, Generating enter data create(soln)
116, Generating enter data copyin(soln%vel_r(:,:,:),soln%vel(:,:,:))
118, Generating present(soln)
120, Loop is parallelizable
122, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
120, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
122, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
129, Generating exit data copyout(soln%vel(:,:,:),soln%vel_r(:,:,:))
130, Generating exit data delete(soln)
% a.out
32.00000000000000 42.00000000000000 52.00000000000000
66.00000000000000 86.00000000000000 106.0000000000000
I found a new problem. I am now working on optimizing my CFD code which has ghost cells, and it seems that the solution is always incorrect, potentially due to indexing error. Therefore I modified the simple code (which I used previously) by changing:
However, when Icompiled and ran, I got incorrect result:
[weich97@thermisto:Fortran_Test]$ pgfortran -acc -Minfo=accel derived_data_para_inout_test.f90 -o derived_data_para_inout_test
set_upper_wall:
53, Generating acc routine seq
linear_extrap:
72, Generating acc routine seq
derived_data_para_inout_test:
129, Generating enter data create(soln)
130, Generating enter data copyin(soln%vel(:,:,:),soln%vel_r(:,:,:))
132, Generating present(soln)
134, Loop is parallelizable
136, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
134, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
136, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
143, Generating exit data copyout(soln%vel(:,:,:),soln%vel_r(:,:,:))
144, Generating exit data delete(soln)
[weich97@thermisto:Fortran_Test]$ ./derived_data_para_inout_test 34.00000000000000 44.00000000000000 54.00000000000000
0.000000000000000 0.000000000000000 0.000000000000000
Since in my CFD code, ghost cells are used (so the index does not starts from 1, it starts from a negative value ), I need to make sure there is a way to pass data correctly into routines, when OpenACC directives are used. Could you please have a check and provide some suggestions for me? I would greatly appreciate your help!
The error is in your Fortran code and will fail with our without the OpenACC directives enabled.
The problem is that you’re passing in the arrays as assumed shape (i.e. “dimension(:,:,:)”) in which case the lower bound is presumed to be 1. If you instead use the explicit bounds or map your indices (i.e. -1 becomes 1, 0 becomes 2, etc.), then it should work.
I just uncommented out the two spots where you use the explicit bounds and your code then worked as expected.