How to gather metrics per nvlink ?

I noticed that the NVLink metrics have the device as their scope. On the sample nvlink provided by CUPTI, the kernel is between only two GPUs.
In the case of having multiple GPU-to-GPU and CPU-to-GPU transfers in our application, I was wondering what is the best way to extract the metrics for each nvlink separately? as an example, I am looking to have ‘nvlink_receive_throughput’ for GPU0 -> GPU1 and GPU0 -> GPU2 and so on.

I noticed that ‘nvprof’ provide such information. How can I obtain such information from CUPTI?


Following steps can be used to gather NVLINK data i.e. topology and metrics and both can be correlated to achieve the requested behavior.

  1. Find the NVLINK topology using CUPTI activity kind CUPTI_ACTIVITY_KIND_NVLINK in the form of records: This step provides information about each NVLINK i.e. ‘device type’ NVLINK is connected to i.e. GPU or CPU and their corresponding ports. Note: Port number '-1' indicates that there is no NVLINK connected to corresponding port.
  2. Profile NVLINK metrics on multiple devices: This step involves collecting profiling data w.r.t NVLINK from the required devices.
  3. Create context on devices to profile.
  4. Create eventGroups/eventGroupSets using respective contexts.
  5. Enable each eventGroup to profile events on each device.
  6. Set attribute CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES using API cuptiEventGroupSetAttribute() to get values from all the NVLINK instances separately on a device.
  7. Run CUDA APIs/kernel that need to be profiled.
  8. Read event values from all eventGroups from all the devices and populate event value array which will have per instance data.
  9. Collect NVLINK metric values per instance from all event value arrays.

Metric values can be correlated with the NVLINK records.

Refer CUPTI sample event_multi_gpu ( for collecting events from different devices/GPUs simultaneously.
Created attached patch for reference purpose to correlate NVLINK records (devices and ports) with NVLINK metrics for two devices. (2.6 KB)

Thanks for your reply. In the patch that you provided I see an error when I compile it
class “CUpti_ActivityNvLink” has no member “uuidDev0”

Did you mean to use idDev0 and idDev1?