I got this information (For all other transfers, the function is fully asynchronous. If pageable memory must first be staged to pinned memory, this will be handled asynchronously with a worker thread.) from cuda document (https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior__memcpy-async
) for the async copy. Then I do some tests with two streams, one for memcopyasyncH2D and another for kernel computing, and there is no dependency between two streams. It seems that memcpy is not async but sync. I don’t know why. Thanks.
cudaMemcpyAsync will be synchronous if the transfer is to or from pageable memory. See here:
Async memory copies will also be synchronous if they involve host memory that is not page-locked.
Thanks. As the description in cuda programming guild, when the data size less than 64KB, MemcpyAsync is asynchronous for pageable memory. For other sizes it is synchronous.