Hi Mat.
Thanks for your help! I have to say I couldn’t get ‘acc_is_present’ to work reliably. It seems like PGI creates an implicit data region for inline_kernels when the data hasn’t yet been copied over - which ‘sort of’ makes sense, except we’re already guarding this case in the ‘if-clause’ in the kernels directive. However, it’s quite strange to me that adding a ‘write’ statement in the caller would change this behaviour. Do you have any idea on what’s going on?
I guess I will try to add present_or_copyin and present_or_copyout in the kernels construct for all the data, to force the compiler doing the right thing.
Edit: I just tried the same program, both with ‘present’ and ‘present_or_copy(in/out)’ clauses in the kernel directive - this didn’t change the outcome. So I don’t know a way to disable that behaviour.
module example
contains
subroutine host_only_subroutine(a, b, c, d)
use helper_functions
use cudafor
real, dimension(256, 256, 10), intent(in) :: a, b
real, dimension(256, 256, 10), intent(out) :: c, d
! write(0,*) "test1" !commenting this write statement will make the update host directive on line 41 fail
call inline_kernels(a, b, c, d)
end subroutine
subroutine wrapper_inline_kernels(a, b, c, d)
use helper_functions
use cudafor
real, intent(in) :: a(256, 256, 10), b(256, 256, 10)
real, intent(out) :: c(256, 256, 10), d(256, 256, 10)
integer(4) :: hf_symbols_are_device_present
hf_symbols_are_device_present = acc_is_present(a)
!$acc enter data copyin(a), copyin(c), copyin(b), copyin(d)
write(0,*) 'entering subroutine wrapper_inline_kernels'
write(0,*) 'calling kernel inline_kernels'
call inline_kernels (a, b, c, d)
!$acc exit data delete(a), copyout(c), delete(b), copyout(d)
end subroutine
subroutine inline_kernels(a, b, c, d)
use helper_functions
use cudafor
real, intent(in) :: a(256, 256, 10), b(256, 256, 10)
real, intent(out) :: c(256, 256, 10), d(256, 256, 10)
integer(4) :: y, x
integer(4) :: hf_symbols_are_device_present
hf_symbols_are_device_present = acc_is_present(a)
write(0,*) 'entering subroutine inline_kernels. device data present: ', hf_symbols_are_device_present
!$acc update host(a) if(hf_symbols_are_device_present)
write(0,*) 'a@1,1,1: ', a(1,1,1)
!$acc kernels if(hf_symbols_are_device_present)
!$acc loop independent vector(16)
do y=1,256
!$acc loop independent vector(16)
do x=1,256
!$acc loop seq
do z=1,10
c(x, y, z)= a(x, y, z)+ b(x, y, z)
end do
end do
end do
!$acc end kernels
!$acc kernels if(hf_symbols_are_device_present)
!$acc loop independent vector(16)
do y=1,256
!$acc loop independent vector(16)
do x=1,256
!$acc loop seq
do z=1,10
d(x, y, z)= a(x, y, z)* b(x, y, z)
end do
end do
end do
!$acc end kernels
end subroutine
end module example
program main
use example
real, dimension(256, 256, 10) :: a, b, c, d, e, f
integer :: x, y, z
integer :: fail_x, fail_y, fail_z
logical test
a(:,:,:) = 1.0d0
b(:,:,:) = 2.0d0
c(:,:,:) = 0.0d0
d(:,:,:) = 0.0d0
e(:,:,:) = 0.0d0
f(:,:,:) = 0.0d0
test = .TRUE.
call host_only_subroutine(a, b, c, d)
call wrapper_inline_kernels(c, d, e, f)
write(6,*) "calculation complete"
do y=1,256
do x=1,256
do z=1,10
if (test .EQ. .TRUE. .AND. e(x, y, z) .NE. 5.0d0) then
test = .FALSE.
fail_x = x
fail_y = y
fail_z = z
end if
if (test .EQ. .TRUE. .AND. f(x, y, z) .NE. 6.0d0) then
test = .FALSE.
fail_x = x
fail_y = y
fail_z = z
end if
end do
end do
end do
if (test .EQ. .TRUE.) then
write(6,*) "test ok"
else
write(6,*) "test failed"
write(6,*) "fails at", fail_x, fail_y, fail_z, "E:", c(fail_x, fail_y, fail_z), "F:", d(fail_x, fail_y, fail_z)
stop 2
end if
stop
end program main
Output when line 10 is commented out:
entering subroutine inline_kernels. device data present: 1
call to cuMemcpyDtoHAsync returned error 1: Invalid value
==> acc_is_present returns ‘1’ even though nothing should have been copied over at this point. the update directive subsequently fails.
Output when line 10 is NOT commented out:
test1
entering subroutine inline_kernels. device data present: 0
a@1,1,1: 1.000000
entering subroutine wrapper_inline_kernels
calling kernel inline_kernels
entering subroutine inline_kernels. device data present: 0
a@1,1,1: 3.000000
==> acc_is_present returns ‘0’ correctly in the first call of inline_kernels. The code works correctly. However: In the second call where acc_is_present is called within an explicit data region, acc_is_present still returns ‘0’. This seems to be an error as well, even though the code still works.