Hi all,
There’re many CUDA memcpy api calls in our project, with the profiling results, we found something interesting here.
-
Some async dtoh memcpy calls act like sync calls, whose duration is longer than the device memory copy time and return only after the actual memory copy end. (Profiling screenshot: https://ibb.co/tC46Hpt) But there’s no such case for htod calls.
-
Some memcpy operations are not associated with a driver api call. (Profiling screenshot: https://ibb.co/XbV9YjW)
How to explain?
Thanks.