Can you please provide an example on how to use the new PGI 11 “device present” directive. I am a bit confused about the difference between the “device present” and “reflected” directives.
Thank you
Can you please provide an example on how to use the new PGI 11 “device present” directive. I am a bit confused about the difference between the “device present” and “reflected” directives.
Thank you
Hi sindimo,
The main difference is when the association between the host and device occurs. With ‘reflected’, the association occurs at compile time. With “present”, the association occurs at run time. This allows ‘present’ to associate global variables as well as arguments. Also, if you are passing device data down multiple calls ‘present’ removes the need to add reflected to each subroutine.
Note that “present” is new in the 1.3 version of the PGI Accelerator Model design spec and will be available later this year.
Hope this helps,
Mat
Also we would be interested in examples of the reflected and device present directives. Could anyone provide a pointer?
Thanks, --Will
Hi Willi,
Here’s an example of using ‘reflective’ and ‘mirrored’. device present isn’t implemented yet, so I don’t have an example.
Hope this helps,
Mat
$ cat test.f90
module mm
implicit none
integer, parameter :: n=40,m=50
integer :: oo = 2
real, dimension(:,:), allocatable :: a
!$acc mirror(a)
contains
subroutine sub1( b, c, w )
implicit none
real :: b(:,:), c(:,:), w(2)
!$acc reflected(b)
integer :: i,j
!$acc region
do j = oo+1,ubound(a,2)-oo
do i = oo+1,ubound(a,1)-oo
a(i,j) = b(i,j)*w(1) + c(i,j)*w(2)
enddo
enddo
!$acc end region
end subroutine
subroutine sub2( b, c, w )
implicit none
real :: b(:,:), c(:,:), w(3)
integer :: n, m
integer :: i
!$acc data region copyin(b)
do i = 1,2
call sub1(b,c,w )
enddo
!$acc end data region
end subroutine
end module
program p
use mm
use accel_lib
implicit none
real :: b(n,m), c(n,m), w(2), aa(n,m)
integer :: i,j
allocate(a(n,m))
do j = 1,m
do i = 1,n
aa(i,j) = -1.0
a(i,j) = -1.0
b(i,j) = (j*100) + i
c(i,j) = -(j*100) + i
enddo
enddo
w(1) = 1.5
w(2) = 0.5
call sub2(b,c,w)
!$acc update host(a(oo+1:n-oo,oo+1:m-oo))
print *, a(5,5), a(n-2,n-2)
end program
$ pgf90 test.f90 -Minfo; a.out
510.0000 3876.000
$ pgf90 mm40.f90 -Minfo -ta=nvidia; a.out
sub1:
18, Generating local(b(:,:))
20, Generating copyin(c(:,:))
Generating copyin(w(1:2))
Generating compute capability 1.0 binary
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
21, Loop is parallelizable
22, Loop is parallelizable
Accelerator kernel generated
21, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
22, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
Cached references to size [2] block of 'w'
CC 1.0 : 14 registers; 120 shared, 16 constant, 0 local memory bytes; 66% occupancy
CC 1.3 : 14 registers; 120 shared, 16 constant, 0 local memory bytes; 100% occupancy
CC 2.0 : 20 registers; 16 shared, 116 constant, 0 local memory bytes; 100% occupancy
sub2:
35, Generating copyin(b(:,:))
p:
63, Generating !$acc update host(a(oo+1:-oo+40,oo+1:-oo+50))
510.0000 3876.000