CUDA Fortran and CUDA API: Constant Memory and Symbol

This thread cannot be much of a surprise as I stumble my way through using the CUDA API with CUDA Fortran. After figuring out–thanks to Brent–how to do asynchronous copies to global memory of 1D and 2D data (and 3D as 1D), I’ve now moved on to the next set of CPU-to-GPU copies I make: asynchronous constant memory copies.

So, being a naïf, I decided to do the usual and experiment with cudaMemcpyToSymbol before adding the Async. Unfortunately, I’m can’t seem to get it just right. To wit, I have, modifying the code Brent provided:

module test3d
   integer*4, device, allocatable :: x(:,:,:)
   integer, constant :: d_constant
contains
   attributes(global) subroutine s1()
      i = threadidx%x
      j = threadidx%y
      k = blockidx%x
      x(i,j,k) = x(i,j,k) + i + j + k + d_constant
   end subroutine s1
end module

program t

use test3d
use cudafor

integer, parameter :: N = 20
integer*4, allocatable, pinned :: h(:,:,:)
type(dim3) :: ngrid, nblock
integer :: h_constant
!
allocate(h(N,N,N))
allocate(x(N,N,N))
!
h = 1
h_constant = 5
!
ngrid = dim3(N,1,1)
nblock= dim3(N,N,1)

istat = cudaMemcpyAsync(x,h,N*N*N)
if (istat .ne. 0) print *,"cudaMemcpyAsync 1 ",istat
istat = cudaThreadSynchronize()
istat = cudaMemcpyToSymbol(d_constant,h_constant,1)
if (istat .ne. 0) print *,"cudaMemcpyToSymbol ",istat
call s1 <<<ngrid, nblock>>> ()

istat = cudaMemcpyAsync(h,x,N*N*N)
if (istat .ne. 0) print *,"cudaMemcpyAsync 2 ",istat
istat = cudaThreadSynchronize()

do k = 1, N
   do j = 1, N
      do i = 1, N
         if (h(i,j,k) .ne. 1+i+j+k+h_constant) then
            print *,"Error at ",i,j,k
         end if
      end do
   end do
end do 

end program t

where I’ve added h_constant, d_constant, the Symbol copy, and the extra code adding the constant in appropriate places.
But, when I try to compile:

> pgfortran test3d.cuf drivert.cuf
test3d.cuf:
drivert.cuf:
PGF90-S-0155-Could not resolve generic procedure cudamemcpytosymbol (drivert.cuf: 23)
  0 inform,   0 warnings,   1 severes, 0 fatal for t

Hmm, okay. So I tried various cudaMemcpyToSymbol calls thinking I need to pass a string (a la C) or the size*kind rather than size or even the offset (which I’m a bit unclear on its purpose…padding?):

istat = cudaMemcpyToSymbol(d_constant,h_constant,1)
istat = cudaMemcpyToSymbol(d_constant,h_constant,4)
istat = cudaMemcpyToSymbol("d_constant",h_constant,1)
istat = cudaMemcpyToSymbol("d_constant",h_constant,4)
istat = cudaMemcpyToSymbol(d_constant,h_constant,1,0)
istat = cudaMemcpyToSymbol("d_constant",h_constant,1,0)

All gave the same error as above.

So…help? Does using “integer, constant” declare d_constant as an integer and not as type(cudaSymbol)? If so, I guess I could see that error (which I usually associate with typing issues).

(Note: This is to say nothing of the inevitable next question. If I have two (or more) constants, do I need to explicitly manage the “offset” part of cudaMemcpyToSymbolAsync?)

Thanks,
Matt