cuMemcpyDtoHAsync acts like a Synconized Call

Hi all,

There’re many CUDA memcpy api calls in our project, with the profiling results, we found something interesting here.

  1. Some async dtoh memcpy calls act like sync calls, whose duration is longer than the device memory copy time and return only after the actual memory copy end. (Profiling screenshot: https://ibb.co/tC46Hpt) But there’s no such case for htod calls.

  2. Some memcpy operations are not associated with a driver api call. (Profiling screenshot: https://ibb.co/XbV9YjW)

How to explain?

Thanks.

Is the memory being asyncly copied pinned? you need it to be pinned so it can really be async operation

Thanks! The memory is not pinned actually.
As for some memory copies which are not associated with API calls, how did that happen?