In CUDA Fortran, can device subroutines directly access device data in the same module? I thought the answer was yes, but I get a strange compiler warning (and the code crashes) when I try to do it. For example, this code compiles with the warning
warning: cast to pointer from integer of different size
Warning: Cannot tell what pointer points to, assuming global memory space
module cudamod
implicit none
integer, device, allocatable, dimension(:) :: int_d
contains
attributes(global) subroutine foo
int_d(threadidx%x) = threadidx%x
end subroutine foo
end module cudamod
program fcuda
use cudafor
use cudamod
implicit none
integer :: int_h(16)
int_h = 0
allocate(int_d(16))
call foo<<<1,16>>>
int_h = int_d
print *,'int_h = ',int_h
deallocate(int_d)
end program fcuda
Unfortunately, support for device data in a module didn’t make it into 10.0. It’s coming, but most likely not until early 2010. In the mean time, you’ll need to pass the variable into the kernel.
Hi,
I receive the same warning than RTLEE when I compile the following code
module cudamod
use cudafor
real(kind=8), dimension(:), allocatable, device :: vec_dev
contains
attributes(device) subroutine sub1(i)
use cudafor
implicit none
integer, value :: i
real(kind=8), dimension(32) :: vec0
integer :: tid
tid = threadidx%x
vec0(tid) = vec_dev(i)
end subroutine
attributes(global) subroutine kernel_test(N,nblocks,nthreads)
use cudafor
implicit none
integer, value :: N, nthreads, nblocks
integer :: i, idx
idx = (blockidx%x-1)*blockdim%x + threadidx%x
do i=idx,N,nthreads*nblocks
call sub1(i)
end do
end subroutine
end module
program test
use cudafor
use cudamod
integer, parameter :: N = 1000
real(kind=8), dimension(N) :: vector
integer :: nblocks, nthreads
vector = 1.D0
allocate(vec_dev(N))
vec_dev = vector(1:N)
nblocks = 30
nthreads = 256
call kernel_test<<<nblocks,nthreads>>>(N,nblocks,nthreads)
deallocate(vec_dev)
end program
From the PGI CUDA Fortran documentation and from Mat’s answer, I was assuming that it is not allowed for the kernel to access device data declared in a module, but that it is possible for any device subprogram in that module and by any host program that uses the module.
Currently only fixed size device arrays are allowed in a Module. However, we are actively working on adding this (it’s our most requested feature). It should be available in 10.6 or possibly sooner.