CUDA Fortran : device variable in module

In CUDA Fortran, can device subroutines directly access device data in the same module? I thought the answer was yes, but I get a strange compiler warning (and the code crashes) when I try to do it. For example, this code compiles with the warning

warning: cast to pointer from integer of different size
Warning: Cannot tell what pointer points to, assuming global memory space

module cudamod
   implicit none

   integer, device, allocatable, dimension(:)  :: int_d

   attributes(global) subroutine foo
      int_d(threadidx%x) = threadidx%x
   end subroutine foo
end module cudamod

program fcuda
   use cudafor
   use cudamod 
   implicit none

   integer :: int_h(16)

   int_h = 0

   call foo<<<1,16>>>
   int_h = int_d
   print *,'int_h = ',int_h
end program fcuda

When I run it, I get

copyout Memcpy FAILED:4

Any help is appreciated!


Unfortunately, support for device data in a module didn’t make it into 10.0. It’s coming, but most likely not until early 2010. In the mean time, you’ll need to pass the variable into the kernel.


Is this included in Jan 7 release?

Sorry, no. I’ll update this post once it’s available.

  • Mat

I receive the same warning than RTLEE when I compile the following code

module cudamod

  use cudafor

  real(kind=8), dimension(:), allocatable, device :: vec_dev


  attributes(device) subroutine sub1(i)
    use cudafor
    implicit none
    integer, value :: i
    real(kind=8), dimension(32) :: vec0
    integer :: tid
    tid = threadidx%x
    vec0(tid) = vec_dev(i)
  end subroutine

  attributes(global) subroutine kernel_test(N,nblocks,nthreads)
    use cudafor
    implicit none
    integer, value :: N, nthreads, nblocks
    integer :: i, idx

    idx = (blockidx%x-1)*blockdim%x + threadidx%x
    do i=idx,N,nthreads*nblocks
        call sub1(i)
    end do

  end subroutine

end module

program test

  use cudafor
  use cudamod

  integer, parameter :: N = 1000
  real(kind=8), dimension(N) :: vector  
  integer :: nblocks, nthreads

  vector = 1.D0
  vec_dev = vector(1:N)
  nblocks = 30
  nthreads = 256
  call kernel_test<<<nblocks,nthreads>>>(N,nblocks,nthreads)

end program

From the PGI CUDA Fortran documentation and from Mat’s answer, I was assuming that it is not allowed for the kernel to access device data declared in a module, but that it is possible for any device subprogram in that module and by any host program that uses the module.

Am I wrong?

Thank you in advance for any explanation!

I experience the same issue. Don’t access the device data from module.


Hi Goblinqueen,

Currently only fixed size device arrays are allowed in a Module. However, we are actively working on adding this (it’s our most requested feature). It should be available in 10.6 or possibly sooner.

  • Mat

Thank you, it would be very important!