cudaMemcpy2dAsync compiling error

Hi all,

I am trying to use cudaMemcpy2dAsync but run into a compilation error.

The syntax is cudaMemcpy2DAsync( dst, dpitch, src, spitch, width, height, kdir, stream).

So in my program I have

istat=cudaMemcpy2DAsync(loop_d,nloopmax,loop,nloopmax,nloop,looplenmax,cudaMemcpyHostToDevice,stream4)

where
nloopmax is the pitch of 2d arrays loop_d and loop
nloop and looplenmax are the width and height of the array block I want to transfer. I use stream4 for this operation.

The error I have is
PGF90-S-0155-Could not resolve generic procedure cudamemcpy2dasync

But if I remove the stream option then no errors occur.

Please help figure it out what the error is.

Thanks,

Lam