Hi Mat,
Here is an example.
========= Begin program ==========
module mod1
real*8, allocatable :: a(:,:), b(:,:), c(:,:)
end module mod1
program prog1
use mod1
allocate(a(100,100),b(100,100),c(100,100))
c=0.0d0
a=1.13240d0
b=2.33413d0
call sub1
end program prog1
subroutine sub1
use mod1
integer i,j,k
!$acc data copyin(a,b) copy(c)
!$acc kernels loop present(a, b, c)
do j=1,100
do i=1,100
do k=1,100
c(i,j) = c(i,j)+a(i,k)*b(k,j)
enddo
enddo
enddo
!$acc end kernels
!$acc end data
end subroutine sub1
=========End of program===============
======== Begin compiler output ===========
pgfortran -acc -Minfo main.f90
prog1:
9, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
10, Memory set idiom, array assignment replaced by call to pgf90_mset8
11, Memory set idiom, array assignment replaced by call to pgf90_mset8
sub1:
20, Generating copyin(b(:,:))
Generating copyin(a(:,:))
Generating copy(c(:,:))
22, Generating present_or_copy(c(:,:))
Generating present_or_copyin(b(:,:))
Generating present_or_copyin(a(:,:))
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
23, Loop is parallelizable
24, Loop is parallelizable
25, Complex loop carried dependence of ‘c’ prevents parallelization
Loop carried dependence of ‘c’ prevents parallelization
Loop carried backward dependence of ‘c’ prevents vectorization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
23, !$acc loop gang ! blockidx%y
24, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
25, CC 1.3 : 17 registers; 136 shared, 4 constant, 0 local memory bytes
CC 2.0 : 33 registers; 0 shared, 152 constant, 0 local memory bytes
==========End compiler output==================
If I delete present(a, b, c) from the parallel construct, the output from the compiler is as follow
sub1:
20, Generating copyin(b(:,:))
Generating copyin(a(:,:))
Generating copy(c(:,:))
22, Generating copy(c(:,:))
Generating copyin(a(:,:))
Generating copyin(b(:,:))
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
Thanks,
Ping