I have a question about cudaMemcpyAsync measurement and its naming.
For transfering D to H and H to D, a host is blocked by cudaMemcpyAsync. The measurement value by nsys is real occupied time on the host. Is my understanding correct?
What is the meaning of async for cudaMemcpyAsync? (the API document says misnormer)