devicetodevice memory copy ?

Hi everyone,

Suppose I have an array that already allocated in device memory. I need to write 2 kernels to process this array. I wonder whether CUDA will automatically copy this array through devicetodevice memcpy or not?

Hope to see your comment about devicetodevice memcpy,

Thanks very much,

There is no need to copy the memory in between the two kernels. Just use the same memory where the first kernel writes its output as input to the second kernel.