Hi all,
I am trying to use cudaMemcpy2dAsync but run into a compilation error.
The syntax is cudaMemcpy2DAsync( dst, dpitch, src, spitch, width, height, kdir, stream).
So in my program I have
istat=cudaMemcpy2DAsync(loop_d,nloopmax,loop,nloopmax,nloop,looplenmax,cudaMemcpyHostToDevice,stream4)
where
nloopmax is the pitch of 2d arrays loop_d and loop
nloop and looplenmax are the width and height of the array block I want to transfer. I use stream4 for this operation.
The error I have is
PGF90-S-0155-Could not resolve generic procedure cudamemcpy2dasync
But if I remove the stream option then no errors occur.
Please help figure it out what the error is.
Thanks,
Lam