I have a question about cudaMemcpyAsync measurement and its naming.
-
For transfering D to H and H to D, a host is blocked by cudaMemcpyAsync. The measurement value by nsys is real occupied time on the host. Is my understanding correct?
-
What is the meaning of async for cudaMemcpyAsync? (the API document says misnormer)
Reference
https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior