Hi. I’m new to cuda fortran and trying to compile the following code using the nvfortran compiler:
module defvars
use cudafor
implicit none
integer, parameter :: n = 100
type Type1
integer, allocatable :: a(:)
integer, allocatable, device :: a_dev(:)
end type Type1
type(Type1) :: T1
end module defvars
module calc
use cudafor
use defvars
implicit none
contains
attributes(global) subroutine kernel()
integer :: i
i = threadIdx%x
T1%a_dev(i) = i
end subroutine kernel
end module calc
program main
use defvars
use calc
use cudafor
implicit none
allocate(T1%a(n))
allocate(T1%a_dev(n))
T1%a = 0
T1%a_dev = T1%a
call kernel<<<1,n>>>()
T1%a = T1%a_dev
print *, T1%a
end program main
A error message pops out as : “NVFORTRAN-S-0520-Host MODULE data cannot be used in a DEVICE or GLOBAL subprogram”.
I’m trying to copy a array in a host derived type to the device, then use this device array in a kernel subroutine. I understand that this can be done by passing the device array and its length as arguments of the kernel. However, since in my real application there are many arrays to be used in a single kernel, I want to warp the array(s) in a derived type and used without specified in the kernel argument to make my code simple and organized. Can it be achieved?
What is the benefit of putting them in a derived type? Can they just be declared in the module scope outside of a derived type? The compiler has to get to a_dev through T1, on the device, so it is difficult to set that up on the host.