Hi,
Here is a test case where I allocate a device variable which is declared in a separate module.
–8<-- compile with pgf90 -Mcuda mod1.cuf kernel.cuf main.f90
PROGRAM main
USE test_mod
CALL test
END PROGRAM
–
MODULE mod1
IMPLICIT NONE
INTEGER, DEVICE, ALLOCATABLE :: foo
CONTAINS
END MODULE
–
module test_mod
use cudafor
use mod1 ! this fails
! INTEGER, DEVICE, ALLOCATABLE :: foo ! this works
contains
attributes(global) subroutine test_kernel( )
foo = 42
end subroutine
subroutine test
REAL*8 temp
integer r
type(dim3) :: dimGrid, dimBlock
allocate(foo)
dimGrid = dim3(1, 1, 1)
dimBlock = dim3(1, 1, 1)
call test_kernel<<<dimGrid,dimBlock>>>()
r = cudathreadsynchronize()
write(0,*) "Value of cudathreadsynchronize = ", r
temp = foo
write(0,*) temp
end subroutine
end module
–8<–
When I run the program, it produces the following output:
–
Value of cudathreadsynchronize = 4
copyout Memcpy (host=0x7ffffa7af254, dev=0x110000, size=4) FAILED:4
However, if I declare the device variable inside the same module where it is allocated, then the test succeeds. My system configuration is: Ubuntu 10.04 (64-bit), GeForce GTX 260, original NVIDIA driver packaged by Ubuntu, PGI Accelerator Fortran Workstation 10.5.
Is this a compiler bug? If so, is there workaround while still allocating variables declared in separate modules?
Regards,
Claude Knaus