device variable in module

Is it possible yet (with 11.0) to place allocatable device variables inside a module? I know this has been a pending feature for a while, but I still get an error when I execute the following code on osx86.

0: ALLOCATE: copyin Symbol Memcpy FAILED:13(invalid device symbol)


module cudamod 
   implicit none 

   integer, device, allocatable, dimension(:)  :: int_d 

   attributes(global) subroutine foo 
      int_d(threadidx%x) = threadidx%x 
   end subroutine foo 
end module cudamod 

program fcuda 
   use cudafor 
   use cudamod 
   implicit none 

   integer :: int_h(16) 

   int_h = 0 

   call foo<<<1,16>>> 
   int_h = int_d 
   print *,'int_h = ',int_h 
end program fcuda



Support for device module allocatables has been available since the 10.6 release and your code runs fine with the 10.6 through 10.9 versions of the compiler. This appears to be a new bug in 11.0 that only occurs on MacOSX. The code runs fine on Linux and Windows. I have send a report to our engineers for further investigation (TPR#17589)

Thanks for the report,

% pgf90 -V10.9 test.cuf -Mcuda
% a.out
 int_h =             1            2            3            4            5 
            6            7            8            9           10           11 
           12           13           14           15           16
% pgf90 -V11.0 test.cuf -Mcuda
% a.out
0: ALLOCATE: copyin Symbol Memcpy FAILED:13(invalid device symbol)


I’m new to CUDA Fortran and have a related question/issue. I can compile and run the matmul.CUF example. I then try adding a module which defines an allocatable device array with no changes to the rest of the code to reference it:

module test_cuda
  use cudafor
  real, device, allocatable, dimension (:) :: testdev
end module test_cuda

The code compiles but when running, it simply exits with no output. If I drop the device attribute, i.e.:

module test_cuda
  use cudafor
  real, allocatable, dimension (:) :: testdev
end module test_cuda

The code compiles and runs as before (the added module is ignored, as I assumed it would have been earlier). What am I missing?

I’m running a trial license of PGI Workstation 11 under win32.

Hi Jason,

Does your code access testdev or is the only change that you added the four lines of source?

Module device data is currently only accessible for routines within the same module that they are declared. The problem being that there isn’t a linker for device code, hence to way to associate external symbols. We are working on adding this capability by essentially doing the link dynamically at runtime (See:

  • Mat


I get the same error with allocatable device variables. I use version 14.10. My excutable works fine with NVidia Quadro K2000. But with NVidia K2200 I get

0: ALLOCATE: copyin Symbol Memcpy FAILED:13(invalid device symbol)

I tried the code submitted bei RTLEE with the same result. K2000 ok, K2200 gives an error.

Is this still a limitation, a bug or what did I wrong?

I used the following compiler flags

-tp=p7 -Mcuda -Mvect -Mquad -Mlre -lblas -Mcache_align -Mflushz -acc -Mbackslash -Minfo=all -Mpreprocess -Minline=

Thanks in advance


Hi Frank,

A K2200 uses the Maxwell (CC5.0) architecture which we don’t support. We officially only support the Tesla product line which does not have a Maxwell enabled device. In the cases where Quadro and GeForce share architectures as Tesla, like is the case with K2000, then binaries should work.

  • Mat

Is there a list of supported devices?

Please see: