Example for "device present" Directive

Can you please provide an example on how to use the new PGI 11 “device present” directive. I am a bit confused about the difference between the “device present” and “reflected” directives.

Thank you

Hi sindimo,

The main difference is when the association between the host and device occurs. With ‘reflected’, the association occurs at compile time. With “present”, the association occurs at run time. This allows ‘present’ to associate global variables as well as arguments. Also, if you are passing device data down multiple calls ‘present’ removes the need to add reflected to each subroutine.

Note that “present” is new in the 1.3 version of the PGI Accelerator Model design spec and will be available later this year.

Hope this helps,

Also we would be interested in examples of the reflected and device present directives. Could anyone provide a pointer?

Thanks, --Will

Hi Willi,

Here’s an example of using ‘reflective’ and ‘mirrored’. device present isn’t implemented yet, so I don’t have an example.

Hope this helps,

$ cat test.f90 

module mm
 implicit none
 integer, parameter :: n=40,m=50
 integer :: oo = 2
 real, dimension(:,:), allocatable :: a
 !$acc mirror(a)

 subroutine sub1( b, c, w )
  implicit none
  real :: b(:,:), c(:,:), w(2)
  !$acc reflected(b)
  integer :: i,j
  !$acc region
   do j = oo+1,ubound(a,2)-oo
    do i = oo+1,ubound(a,1)-oo
     a(i,j) = b(i,j)*w(1) + c(i,j)*w(2)
  !$acc end region 
 end subroutine

 subroutine sub2( b, c, w )
  implicit none
  real :: b(:,:), c(:,:), w(3)
  integer :: n, m
  integer :: i

  !$acc data region copyin(b) 
  do i = 1,2
   call sub1(b,c,w )
  !$acc end data region 

 end subroutine
end module

program p
 use mm
 use accel_lib
 implicit none
 real :: b(n,m), c(n,m), w(2), aa(n,m)
 integer :: i,j
 do j = 1,m
  do i = 1,n
   aa(i,j) = -1.0
   a(i,j) = -1.0
   b(i,j) = (j*100) + i
   c(i,j) = -(j*100) + i

  w(1) = 1.5
  w(2) = 0.5
  call sub2(b,c,w)
  !$acc update host(a(oo+1:n-oo,oo+1:m-oo))

  print *, a(5,5), a(n-2,n-2)
end program

$ pgf90 test.f90 -Minfo; a.out
    510.0000        3876.000    
$ pgf90 mm40.f90 -Minfo -ta=nvidia; a.out
     18, Generating local(b(:,:))
     20, Generating copyin(c(:,:))
         Generating copyin(w(1:2))
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     21, Loop is parallelizable
     22, Loop is parallelizable
         Accelerator kernel generated
         21, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
         22, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
             Cached references to size [2] block of 'w'
             CC 1.0 : 14 registers; 120 shared, 16 constant, 0 local memory bytes; 66% occupancy
             CC 1.3 : 14 registers; 120 shared, 16 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 20 registers; 16 shared, 116 constant, 0 local memory bytes; 100% occupancy
     35, Generating copyin(b(:,:))
     63, Generating !$acc update host(a(oo+1:-oo+40,oo+1:-oo+50))
    510.0000        3876.000