gathering cpu-to-gpu and gpu-to-gpu transfers at the same time

I was reading your example of nvlink. I can see that nvlink metrics are for device contex.
So, does it mean that I can’t record the nvlink throughput for cpu-to-gpu and gpu-to-gpu at the same time?

What should I do if I want to get them at the same time?
cuptiMetricGetValue has a argument of dev which I should pass the gpuDevice.


Following steps can be used to gather NVLINK data i.e. topology and metrics and both can be correlated to achieve the requested behavior.

  1. Find the NVLINK topology using CUPTI activity kind CUPTI_ACTIVITY_KIND_NVLINK in the form of records: This step provides information about each NVLINK i.e. ‘device type’ NVLINK is connected to i.e. GPU or CPU and their corresponding ports. Note: Port number '-1' indicates that there is no NVLINK connected to corresponding port.
  2. Profile NVLINK metrics on multiple devices: This step involves collecting profiling data w.r.t NVLINK from the required devices.
  3. Create context on devices to profile.
  4. Create eventGroups/eventGroupSets using respective contexts.
  5. Enable each eventGroup to profile events on each device.
  6. Set attribute CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES using API cuptiEventGroupSetAttribute() to get values from all the NVLINK instances separately on a device.
  7. Run CUDA APIs/kernel that need to be profiled.
  8. Read event values from all eventGroups from all the devices and populate event value array which will have per instance data.
  9. Collect NVLINK metric values per instance from all event value arrays.

Metric values can be correlated with the NVLINK records.

Refer CUPTI sample event_multi_gpu ( for collecting events from different devices/GPUs simultaneously.
Created attached patch for reference purpose to correlate NVLINK records (devices and ports) with NVLINK metrics for two devices. (2.6 KB)