In CUDA Fortran, can device subroutines directly access device data in the same module? I thought the answer was yes, but I get a strange compiler warning (and the code crashes) when I try to do it. For example, this code compiles with the warning
warning: cast to pointer from integer of different size
Warning: Cannot tell what pointer points to, assuming global memory space
module cudamod implicit none integer, device, allocatable, dimension(:) :: int_d contains attributes(global) subroutine foo int_d(threadidx%x) = threadidx%x end subroutine foo end module cudamod program fcuda use cudafor use cudamod implicit none integer :: int_h(16) int_h = 0 allocate(int_d(16)) call foo<<<1,16>>> int_h = int_d print *,'int_h = ',int_h deallocate(int_d) end program fcuda
When I run it, I get
copyout Memcpy FAILED:4
Any help is appreciated!