hi everyone!
I am using cuda fortran (pgi 11.8) on Testla M2050. I encounter a very stranger error:
0: copyout Memcpy (host=0x6c70d0, dev=0xa140000, size=16) FAILED: 4(unspecified launch failure)
It seems that the data transfer from device to host was wrong. My code is very simple (see below). If i change the type of b1 from an array to a scalar, there is no error.
attributes(global) subroutine addkernel(A)
integer,intent(inout)::A(:,:)
integer::b1(1:2)
b1=2
A(1,1)=A(1,1)+b1(1)
end subroutine
program main
use cudafor
implicit none
integer,device,allocatable::A_d(:,:)
integer,allocatable::A(:,:)
type(dim3)::dimGrid,dimBlock
dimGrid=dim3(1,1,1)
dimBlock=dim3(1,1,1)
allocate(A(2,2))
A=2
allocate(A_d(2,2))
A_d=A
call addkernel<<>>(A_d)
A=A_d
end program main
anybody can tell me what is wrong??
thank you in advance
It seems that if i change “integer::b1(1:2)” to “integer,shared::b1(1:2)”,it will be ok. We cannot use registers to store an array in cuda fortran?