I am trying to use cudaMemcpy2dAsync but run into a compilation error.
The syntax is cudaMemcpy2DAsync( dst, dpitch, src, spitch, width, height, kdir, stream).
So in my program I have
nloopmax is the pitch of 2d arrays loop_d and loop
nloop and looplenmax are the width and height of the array block I want to transfer. I use stream4 for this operation.
The error I have is
PGF90-S-0155-Could not resolve generic procedure cudamemcpy2dasync
But if I remove the stream option then no errors occur.
Please help figure it out what the error is.