problem with constant memory

Hello all,

I’m just trying to use CUDA Fortran and running some simple examples. And I have a problem with constant memory.

This is code of main program:

program main

use kernel
use cudafor
implicit real*8 (a-h,o-z)

type(dim3) N_blocks, N_threads
integer, parameter :: N=128

real*8, dimension(N, N, N) :: res
real*8, dimension(N, N, N), device :: resD

real*8, device :: x



call kernel<<<N_blocks>>>(x,resD, N)


do i=1,N
 do j=1,N
  do k=1,N
  end do
 end do
end do

print*,'sum = ', sum

end program main

And this is my kernel:

module kernel

integer, constant :: i1=2, i2=2  
real*8, constant :: pi=acos(-1.d0)
real*8, dimension(2), constant :: theta


attributes(global) subroutine kernel(x, resD, N)
use cudafor
implicit none
real*8, device :: x
real*8, dimension (N, N, N), device :: resD
integer, value :: N
integer i,j,k


! resD(i,j,k)=i1*x*theta(1)   ! this works
  resD(i,j,k)=i2*x*theta(1)   ! but this does not work!

end subroutine kernel
end module kernel

The problem is that this kernel works when I use constant value i1, and does not work with constant i2
(error - “0: copyout Memcpy (host=0x6ade60, dev=0xf200000000, size=16777216) FAILED: 4(unspecified launch failure)”).
But there are no differenses between i1 and i2 at all, exept only one - I declarated firstly i1 and then i2. If I declarate i2 and then i1 this code will work for i2 and will not work for i1…
I guess that there is any problem with access to constant memoty at device, but I don’t understand what the problem is exactly? How should I use constant memory in order to skip such errors?

Hi Okat,

How should I use constant memory in order to skip such errors?

The work around is to set the values of constant variables from host code, and not use data initialization.

Hope this helps,

Unfortunatelly it doesn’t help… Now I add to the main program defenition of the constant variables i1 and i2:


and in the kernel I still have just declaration of these variables:

integer, constant :: i1, i2

but error is the same -
0: copyout Memcpy (host=0x6af1c0, dev=0xf200000000, size=16777216) FAILED: 4(unspecified launch failure)

Hi Okat,

My apologizes. On further investigation the problem appears to be due the interaction of the constant variable and the scalar device argument, “x”, and not constant data initialization. When I change “x” to be a host variable passed to the kernel by value, or move it to be a module device variable, then the code runs correctly. Note, that passing “x” by value also improves the performance.

I have reported this problem as TPR#18162. The good news is that the problem appears to have been already found and fixed internally, thought the fix has not yet been added to a release. Unfortunately, it might be too late to get it into the upcoming 11.9 release, but I’ll see what we can do.


Mat, thank you very much, now it works.