I noticed that the NVLink metrics have the device as their scope. On the sample nvlink provided by CUPTI, the kernel is between only two GPUs.
In the case of having multiple GPU-to-GPU and CPU-to-GPU transfers in our application, I was wondering what is the best way to extract the metrics for each nvlink separately? as an example, I am looking to have ‘nvlink_receive_throughput’ for GPU0 -> GPU1 and GPU0 -> GPU2 and so on.
I noticed that ‘nvprof’ provide such information. How can I obtain such information from CUPTI?