Hi everyone,
Suppose I have an array that already allocated in device memory. I need to write 2 kernels to process this array. I wonder whether CUDA will automatically copy this array through devicetodevice memcpy or not?
Hope to see your comment about devicetodevice memcpy,
Thanks very much,