Sync Copy Engine vs Async Copy Engine in Nsight Systems GPU Metrics

Hello, I am trying to understand the meaning of the “Sync Copy Engine Active” and “Async Copy Engine Active” metrics shown in Nsight Systems.

In the Release Notes — nsight-systems 2024.5 documentation, I found the following description:

Copy engines are also exposed in the general metrics-set to better understand GPU activity for some architectures such as NVIDIA Ada Architecture. Synchronous copy engines are used in the graphics command sequences. Async copy engines are used in both compute and graphics to copy resources (typically) in the background.

I am using NVIDIA RTX 4000 Blackwell GPUs, and I can also see both Async Copy Engine Active and Sync Copy Engine Active in Nsight Systems.

However, I am still confused about how I should interpret these metrics.

My question is:

  • Should I understand this as
    “the GPU has copy engines, and Nsight Systems classifies copy-engine activity into sync and async categories”?

or

  • Does this mean
    “there are physically distinct types of copy engines, namely sync copy engines and async copy engines”?

If there is any official documentation or architectural explanation clarifying this, I would really appreciate it.

Thank you very much for your time and help.

Sync and Async are misnomers in CUDA use cases as these names came from graphics use cases.

In CUDA all copy engines are asynchronous to the GR (graphics/compute) engine. On most GPUs each copy engine can report active; there is no method in PM system to collect bytes count. CUDA memory copy trace can be used to understand when specific tasks are in flight.

Thank you again for your earlier reply.

I have one follow-up question about Nsight Systems.

In my microbenchmark, for the p2p_ce case, I enqueue the transfer on a destination GPU stream and call:

cudaMemcpyPeerAsync(d_dst, args.dst, d_src, args.src, args.bytes, s);

So the copy is issued from the destination GPU side in software.

In such a case, is there any way in Nsight Systems to determine whether the actual transfer is carried out by the source GPU’s copy engine or the destination GPU’s copy engine?

If Nsight Systems cannot reveal that level of ownership, please let me know as well.

Thank you again for your time and help.