Hi,
I have some constant coefficients matrices which are defined in some modules. I would like to copy them on the GPU in the main program and then use them in some subroutines with a use statement.
Here is an example:
! test programme
module data_par
real :: coef(5)
!$acc local(coef)
DATA (coef(i),i=1,5)/ .1 , .4, .8, 1.0, 1.2/
end module data_par
module computation
USE data_par
implicit none
contains
subroutine gpu_routine(nvec,nlev,a,ic)
real, intent(inout) :: a(:,:)
integer, intent(in) :: nvec,nlev,ic
integer :: i,k
!$acc reflected(a)
!$acc region do kernel
do i=1,nvec
do k=2,nlev
a(i,k)=coef(ic)*(a(i,k)+a(i,k-1))
end do
end do
!$acc end region
end subroutine gpu_routine
end module computation
program main
USE data_par
USE computation, only: gpu_routine
implicit none
real, allocatable :: a(:,:)
!$acc mirror(a)
integer, parameter :: n1=10000, nlev=60
integer :: ic
allocate(a(n1,nlev))
!init a
a=0.1
!$acc update device(a)
!$acc update device(coef)
do ic=1,5
call gpu_routine(n1,nlev,a,ic)
end do
!$acc update host(a)
print*, sum(a)
end program main
Where the coefficient matrix coef is defined in module data_par.
This approach is not doing what I want as I am getting the following messages from the compiler
...
gpu_routine:
19, Generating reflected(a(:,:))
21, Generating copyin(coef(ic))
..
46, update device(coef) is not within a data region for this array
showing that a copyin of coef is generated inside the subroutine gpu_routine and that the update in the main code cannot be generated.
I then tried to replace the
!$acc local(coef)
with
!$acc mirror(coef)
Although this is probably not valid since coef is not allocatable. From the compiler output it looks good (no copyin and it generates a update device(coef(:)). However, when I try to run the code I got the following error:
call to cuMemcpyHtoD returned error 1: Invalid value
CUDA driver version: 4000
I guess the array coef was not allocated on the device in this case.
Any idea on how I should proceed ?
Thanks,
Xavier