This thread cannot be much of a surprise as I stumble my way through using the CUDA API with CUDA Fortran. After figuring out–thanks to Brent–how to do asynchronous copies to global memory of 1D and 2D data (and 3D as 1D), I’ve now moved on to the next set of CPU-to-GPU copies I make: asynchronous constant memory copies.
So, being a naïf, I decided to do the usual and experiment with cudaMemcpyToSymbol before adding the Async. Unfortunately, I’m can’t seem to get it just right. To wit, I have, modifying the code Brent provided:
module test3d integer*4, device, allocatable :: x(:,:,:) integer, constant :: d_constant contains attributes(global) subroutine s1() i = threadidx%x j = threadidx%y k = blockidx%x x(i,j,k) = x(i,j,k) + i + j + k + d_constant end subroutine s1 end module program t use test3d use cudafor integer, parameter :: N = 20 integer*4, allocatable, pinned :: h(:,:,:) type(dim3) :: ngrid, nblock integer :: h_constant ! allocate(h(N,N,N)) allocate(x(N,N,N)) ! h = 1 h_constant = 5 ! ngrid = dim3(N,1,1) nblock= dim3(N,N,1) istat = cudaMemcpyAsync(x,h,N*N*N) if (istat .ne. 0) print *,"cudaMemcpyAsync 1 ",istat istat = cudaThreadSynchronize() istat = cudaMemcpyToSymbol(d_constant,h_constant,1) if (istat .ne. 0) print *,"cudaMemcpyToSymbol ",istat call s1 <<<ngrid, nblock>>> () istat = cudaMemcpyAsync(h,x,N*N*N) if (istat .ne. 0) print *,"cudaMemcpyAsync 2 ",istat istat = cudaThreadSynchronize() do k = 1, N do j = 1, N do i = 1, N if (h(i,j,k) .ne. 1+i+j+k+h_constant) then print *,"Error at ",i,j,k end if end do end do end do end program t
where I’ve added h_constant, d_constant, the Symbol copy, and the extra code adding the constant in appropriate places.
But, when I try to compile:
> pgfortran test3d.cuf drivert.cuf test3d.cuf: drivert.cuf: PGF90-S-0155-Could not resolve generic procedure cudamemcpytosymbol (drivert.cuf: 23) 0 inform, 0 warnings, 1 severes, 0 fatal for t
Hmm, okay. So I tried various cudaMemcpyToSymbol calls thinking I need to pass a string (a la C) or the size*kind rather than size or even the offset (which I’m a bit unclear on its purpose…padding?):
istat = cudaMemcpyToSymbol(d_constant,h_constant,1) istat = cudaMemcpyToSymbol(d_constant,h_constant,4) istat = cudaMemcpyToSymbol("d_constant",h_constant,1) istat = cudaMemcpyToSymbol("d_constant",h_constant,4) istat = cudaMemcpyToSymbol(d_constant,h_constant,1,0) istat = cudaMemcpyToSymbol("d_constant",h_constant,1,0)
All gave the same error as above.
So…help? Does using “integer, constant” declare d_constant as an integer and not as type(cudaSymbol)? If so, I guess I could see that error (which I usually associate with typing issues).
(Note: This is to say nothing of the inevitable next question. If I have two (or more) constants, do I need to explicitly manage the “offset” part of cudaMemcpyToSymbolAsync?)