Is there any way to find out if a program running on 3080 (Ampere) uses async-copy or not? I mean via cuobjdump, nsight, …
You can use nvprof/nsight to see it. If there are many call and both async/sync you’d have to figure out if they are relevant to what you are looking for
nvprof is not the right tool for Ampere architecture GPUs.
nsight compute and nsight systems are the tools to use with Ampere. I’m not sure if cc8.x async copy is captured by nsight compute, that would be a good question to pose on the nsight compute forum section.
Is it possible to move this thread or I have to open a new one?
This information about async-copies should be available from Nsight Systems
I didn’t find that via Nsight Systems. If you know more on where should I look for or which option I should use, please let me know.
In the Nsight System timeline under CUDA API trace you can find the async memory copy APIs such as cudaMemcpyAsync
There is a question though. The async-copy is released for Ampere [1, 2], however when I search for cudaMemcpyAsync  I see several posts in the past. So, I guess that what I am looking for is not cudaMemcpyAsync. Any thought about that?
OK I found more information. As I read this blog, the
cudaMemcpyAsync is for asynchronously copying data between CPU memory and GPU global memory. So, that is not I am looking for which is specific for Ampere and not the previous generations.
According to the programming guide, the
memcpy_async API is available which can be used by the programmer. The question is, if the programmer doesn’t use that, will nvcc use that for optimization?
If nvcc doesn’t optimize the code by adding that in the SASS code, then I am able to grep for
memcpy_async in the source code. Otherwise the problem becomes harder as I don’t know which SASS instructions are related to async-copy.