What's the use of driver API "cuMemcpyDtoDAsync()"?

Some documents said memory copy within a same device is asynchronous,so I think API “cuMemcpyDtoD()” is asynchronous,but what’s the use of API “cyMemcpyDtoDAsync()”? Under what circumstances I should call “cuMemcpyDtoDAsync()”?

First of all, you would use it when you are using the driver API. Second, you would use it (typically) when you want to copy data from one place in device global memory to another place in device global memory.

It is analogous to the runtime API function cudaMemcpyAsync, with the transfer kind “cudaMemcpyDeviceToDevice” specified.

cuMemcpyDtoD is a blocking API call, issued to the default stream only. note that the documentation:

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g1725774abf8b51b91945f3336b778c8b

says:
“This function exhibits synchronous behavior for most use cases.”

cuMemcpyDtoDAsync can be a non-blocking call, issuable to any stream. Note that the documentation:

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g39ea09ba682b8eccc9c3e0c04319b5c8

says:

“This function exhibits asynchronous behavior for most use cases.”