OpenACC: Check if data present

In my code there’s lots of subroutines that are used from within host-only code as well as GPU enabled code. So far I’m still missing a way to tell the OpenACC to ignore directives when being in host mode. As I understand, the ‘if’ clause is intended for that matter. However, I cannot find documentation on how I could check whether data is present on the device or not, in order to use this as a condition for the if clause.

As an example, in the mpi communication code, i’d like a way to specify

!$acc update host(a) if (is_present(a))

This way I wouldn’t have to pass in a ‘this_is_host_code’ logical or implement the function with two wrappers or any of that - the code would become universally usable. Is there a way to do that?

Hi Michel,

In the upcoming OpenACC 2.5 standard, the behavior of “update” will change. If the data isn’t present, then the update will be ignored.

In the meantime, what you’ll need to do is call the “acc_is_present” routine, capture the result in a variable, then use this variable in the “if” clause. Something like:

#ifdef _OPENACC
int is_present = acc_is_present(A0,nx*ny*nz*sizeof(float));
#pragma acc update device (A0[0:nx*ny*nz]) if (is_present)
#endif
  • Mat

Hi Mat.

Thanks for your help! I have to say I couldn’t get ‘acc_is_present’ to work reliably. It seems like PGI creates an implicit data region for inline_kernels when the data hasn’t yet been copied over - which ‘sort of’ makes sense, except we’re already guarding this case in the ‘if-clause’ in the kernels directive. However, it’s quite strange to me that adding a ‘write’ statement in the caller would change this behaviour. Do you have any idea on what’s going on?

I guess I will try to add present_or_copyin and present_or_copyout in the kernels construct for all the data, to force the compiler doing the right thing.

Edit: I just tried the same program, both with ‘present’ and ‘present_or_copy(in/out)’ clauses in the kernel directive - this didn’t change the outcome. So I don’t know a way to disable that behaviour.

module example
contains

subroutine host_only_subroutine(a, b, c, d)
  use helper_functions
  use cudafor
  real, dimension(256, 256, 10), intent(in) :: a, b
  real, dimension(256, 256, 10), intent(out) :: c, d

!   write(0,*) "test1" !commenting this write statement will make the update host directive on line 41 fail
  call inline_kernels(a, b, c, d)
end subroutine

 subroutine wrapper_inline_kernels(a, b, c, d)
  use helper_functions
  use cudafor
  real, intent(in) :: a(256, 256, 10), b(256, 256, 10)
  real, intent(out) :: c(256, 256, 10), d(256, 256, 10)

  integer(4) :: hf_symbols_are_device_present
  hf_symbols_are_device_present = acc_is_present(a)
!$acc enter data copyin(a), copyin(c), copyin(b), copyin(d)
  write(0,*) 'entering subroutine wrapper_inline_kernels'

  write(0,*) 'calling kernel inline_kernels'
  call inline_kernels (a, b, c, d)

!$acc exit data delete(a), copyout(c), delete(b), copyout(d)
end subroutine

 subroutine inline_kernels(a, b, c, d)
  use helper_functions
  use cudafor
  real, intent(in) :: a(256, 256, 10), b(256, 256, 10)
  real, intent(out) :: c(256, 256, 10), d(256, 256, 10)

  integer(4) :: y, x
  integer(4) :: hf_symbols_are_device_present
  hf_symbols_are_device_present = acc_is_present(a)
  write(0,*) 'entering subroutine inline_kernels. device data present: ', hf_symbols_are_device_present
!$acc update host(a) if(hf_symbols_are_device_present)
  write(0,*) 'a@1,1,1: ', a(1,1,1)

!$acc kernels if(hf_symbols_are_device_present)
!$acc loop independent vector(16)
  do y=1,256
!$acc loop independent vector(16)
   do x=1,256
!$acc loop seq
    do z=1,10
     c(x, y, z)= a(x, y, z)+ b(x, y, z)
    end do
   end do
  end do
!$acc end kernels

!$acc kernels if(hf_symbols_are_device_present)
!$acc loop independent vector(16)
  do y=1,256
!$acc loop independent vector(16)
   do x=1,256
!$acc loop seq
    do z=1,10
     d(x, y, z)= a(x, y, z)* b(x, y, z)
    end do
   end do
  end do
!$acc end kernels
end subroutine
end module example

program main
use example
real, dimension(256, 256, 10) :: a, b, c, d, e, f
integer :: x, y, z
integer :: fail_x, fail_y, fail_z
logical test

a(:,:,:) = 1.0d0
b(:,:,:) = 2.0d0
c(:,:,:) = 0.0d0
d(:,:,:) = 0.0d0
e(:,:,:) = 0.0d0
f(:,:,:) = 0.0d0
test = .TRUE.

call host_only_subroutine(a, b, c, d)
call wrapper_inline_kernels(c, d, e, f)

write(6,*) "calculation complete"

do y=1,256
 do x=1,256
  do z=1,10
   if (test .EQ. .TRUE. .AND. e(x, y, z) .NE. 5.0d0) then
   test = .FALSE.
   fail_x = x
   fail_y = y
   fail_z = z
   end if
   if (test .EQ. .TRUE. .AND. f(x, y, z) .NE. 6.0d0) then
   test = .FALSE.
   fail_x = x
   fail_y = y
   fail_z = z
   end if
  end do
 end do
end do

if (test .EQ. .TRUE.) then
write(6,*) "test ok"
else
write(6,*) "test failed"
write(6,*) "fails at", fail_x, fail_y, fail_z, "E:", c(fail_x, fail_y, fail_z), "F:", d(fail_x, fail_y, fail_z)
stop 2
end if

stop
end program main

Output when line 10 is commented out:

entering subroutine inline_kernels. device data present: 1
call to cuMemcpyDtoHAsync returned error 1: Invalid value

==> acc_is_present returns ‘1’ even though nothing should have been copied over at this point. the update directive subsequently fails.

Output when line 10 is NOT commented out:

test1
entering subroutine inline_kernels. device data present: 0
a@1,1,1: 1.000000
entering subroutine wrapper_inline_kernels
calling kernel inline_kernels
entering subroutine inline_kernels. device data present: 0
a@1,1,1: 3.000000

==> acc_is_present returns ‘0’ correctly in the first call of inline_kernels. The code works correctly. However: In the second call where acc_is_present is called within an explicit data region, acc_is_present still returns ‘0’. This seems to be an error as well, even though the code still works.

Hi Michel,

The second issue is because “acc_is_present” is being implicitly typed. Adding a “use openacc” after “use cuda” will fix this issue.

For the first issue, I was not able to recreate it. I did have to comment out the “use helper_functions”, but I wouldn’t think that this would matter. Just in case, can you post this module? Also, what flags, compiler version and OS are you using?

One thought why something this would occur is if some optimization, like inlining, gets disabled when the write statement is there. Though, the subroutine inlines for me either way so I doubt that’s the true cause.

  • Mat

Hi Mat.

Thanks again! Yes, after adding ‘use openacc’ it works for me. Also, there was a logical error in my code in wrapper_inline_kernels: the ‘acc_is_present’ call should be after the ‘enter data’ directive obviously, but I think that didn’t matter in the tests I did above. The compiler command was as follows:

pgf90 -acc -Minfo=accel,inline,ipa -Mneginfo -Mcuda=cc2x,cc3x -ta=nvidia,cc2x,cc3x -fast -DGPU -c example.f90 -o example.o

.

Sorry about the ‘helper_functions’, I forgot to strip these out for this example. I’m actually integrating all these examples as unit tests / code examples for my framework that is supposed to abstract on top of OpenACC or CUDA Fortran. This particular one you can find here: https://github.com/muellermichel/Hybrid-Fortran/tree/master/examples/openACC_hybrid_hostonly. You’d have to clone Hybrid Fortran, set the ‘HF_DIR’ environment variable to the repo and run ‘make tests’ in the example directory if you wanted to see the finished thing. In this particular example I’m using a backend implementation that automatically compares a data trace to one that is created by the CPU-only-implementation - so the preprocessed code contains lots of additional tracing code.

Hi Michel,

I should have realized that you’re the author of Hybrid Fortran. We meet several years after your first release. Glad to see that its still going strong.

I’m at the NVIDIA GTC conference right now so can’t investigate this right now, but will as soon as I can.

  • Mat

Hey, have fun at the GTC - I’d like very much to be there, but I’m trying to finish a paper …