Hi Fedele.Stabile,
While your program has a number of issues, the “.STATICS3” error does appear to be a compiler issue when a program using ACC data regions outside of a module is compiled with -Mcuda. I have written a report (TPR#18533) and sent it to our compiler engineers for further investigation.
Note that your code has a number of errors, such as the wrong loop bounds (i.e. it uses nvec instead of n1), it doesn’t pass in arguments to “compute”, uses mirror for the dummy argument “a” (it should be reflected), missing an interface to “compute”, and “adev” is unnecessary since “a” is mirrored.
Here’s the corrected code, however the “STATIC3” error will persist if you add -Mcuda.
% cat test.f90
program main
implicit none
integer, parameter :: n1=10, nlev=60
real, dimension(n1,nlev) :: a
integer :: loop
!$acc mirror(a)
interface
subroutine compute (n1,nlev,a)
integer :: n1, nlev
real, dimension(n1,nlev) :: a
!$acc reflected(a)
end subroutine compute
end interface
a=0.1
!$acc update device(a)
call compute(n1,nlev,a)
!$acc update host(a)
print*, sum(a)
end program main
subroutine compute (n1,nlev,a)
integer :: n1, nlev
real, dimension(n1,nlev) :: a
integer :: i,k
!$acc reflected(a)
!$acc region
do i=1,n1
do k=1,nlev
a(i,k)=a(i,k)*a(i,k)
end do
end do
!$acc end region
end subroutine compute
% pgfortran test.f90 -V12.3 -Minfo=accel -ta=nvidia
main:
7, Generating local(a(:,:))
19, Generating update device(a(:,:))
21, Generating update host(a(:,:))
compute:
31, Generating reflected(a(:,:))
33, Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
34, Loop is parallelizable
35, Loop is parallelizable
Accelerator kernel generated
34, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
35, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
CC 1.0 : 6 registers; 48 shared, 8 constant, 0 local memory bytes; 100% occupancy
CC 2.0 : 8 registers; 8 shared, 56 constant, 0 local memory bytes; 100% occupancy
% a.out
6.000042
Moving “compute” into a module, will work around the STATIC3 issue.
% cat test2.f90
module foo
contains
subroutine compute (n1,nlev,a)
integer :: n1, nlev
real, dimension(n1,nlev) :: a
integer :: i,k
!$acc reflected(a)
!$acc region
do i=1,n1
do k=1,nlev
a(i,k)=a(i,k)*a(i,k)
end do
end do
!$acc end region
end subroutine compute
end module foo
program main
use foo
implicit none
integer, parameter :: n1=10, nlev=60
real, dimension(n1,nlev) :: a
integer :: loop
!$acc mirror(a)
a=0.1
!$acc update device(a)
call compute(n1,nlev,a)
!$acc update host(a)
print*, sum(a)
end program main
% pgfortran test2.f90 -V12.3 -Minfo=accel -ta=nvidia -Mcuda
compute:
9, Generating reflected(a(:,:))
12, Loop is parallelizable
13, Loop is parallelizable
Accelerator kernel generated
12, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
13, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
main:
29, Generating local(a(:,:))
33, Generating update device(a(:,:))
35, Generating update host(a(:,:))
% a.out
6.000042
Thanks!
Mat