I’m just trying to use CUDA Fortran and running some simple examples. And I have a problem with constant memory.
This is code of main program:
program main use kernel use cudafor implicit real*8 (a-h,o-z) type(dim3) N_blocks, N_threads integer, parameter :: N=128 real*8, dimension(N, N, N) :: res real*8, dimension(N, N, N), device :: resD real*8, device :: x theta=4 x=1.d0 N_blocks=dim3(N,N,1) N_threads=dim3(N,1,1) call kernel<<<N_blocks>>>(x,resD, N) res=resD sum=0.d0 do i=1,N do j=1,N do k=1,N sum=sum+res(i,j,k) end do end do end do print*,'sum = ', sum end program main
And this is my kernel:
module kernel integer, constant :: i1=2, i2=2 real*8, constant :: pi=acos(-1.d0) real*8, dimension(2), constant :: theta contains attributes(global) subroutine kernel(x, resD, N) use cudafor implicit none real*8, device :: x real*8, dimension (N, N, N), device :: resD integer, value :: N integer i,j,k k=threadIdx%x i=blockIdx%x j=blockIdx%y ! resD(i,j,k)=i1*x*theta(1) ! this works resD(i,j,k)=i2*x*theta(1) ! but this does not work! end subroutine kernel end module kernel
The problem is that this kernel works when I use constant value i1, and does not work with constant i2
(error - “0: copyout Memcpy (host=0x6ade60, dev=0xf200000000, size=16777216) FAILED: 4(unspecified launch failure)”).
But there are no differenses between i1 and i2 at all, exept only one - I declarated firstly i1 and then i2. If I declarate i2 and then i1 this code will work for i2 and will not work for i1…
I guess that there is any problem with access to constant memoty at device, but I don’t understand what the problem is exactly? How should I use constant memory in order to skip such errors?