Why do we have so many copy routines?


there has always been a question to me why we have so many copy routines.
Wouldn’t it make sense to have only one copy routine for synchronous data-transfer and one for asynchronous.

If I take a look in the CUDA_MEMCPY2D and the CUDA_MEMCPY3 structures it seems that they are almost equivalent.

There is not really a clear structure neither a clear documentation what to use: According to the doc for the cuMemcpy2D and cuMemcpy3D
it’s for cuda arrays although I can also use it for linear memory. Can I also use it for host memory or page-locked memory?

What behaviour would it have if I completely omit the 1D methods for example cuMemcpyDtoH and use the 3D structure only to copy my memories?

It would be very nice to have a detailed explanation on this.