I was reading your example of nvlink. I can see that nvlink metrics are for device contex.
So, does it mean that I can’t record the nvlink throughput for cpu-to-gpu and gpu-to-gpu at the same time?
What should I do if I want to get them at the same time?
cuptiMetricGetValue has a argument of dev which I should pass the gpuDevice.
thanks,
Following steps can be used to gather NVLINK data i.e. topology and metrics and both can be correlated to achieve the requested behavior.
- Find the NVLINK topology using CUPTI activity kind CUPTI_ACTIVITY_KIND_NVLINK in the form of records:
This step provides information about each NVLINK i.e. ‘device type’ NVLINK is connected to i.e. GPU or CPU and their corresponding ports.
Note: Port number '-1' indicates that there is no NVLINK connected to corresponding port.
- Profile NVLINK metrics on multiple devices:
This step involves collecting profiling data w.r.t NVLINK from the required devices.
- Create context on devices to profile.
- Create eventGroups/eventGroupSets using respective contexts.
- Enable each eventGroup to profile events on each device.
- Set attribute CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES using API cuptiEventGroupSetAttribute() to get values from all the NVLINK instances separately on a device.
- Run CUDA APIs/kernel that need to be profiled.
- Read event values from all eventGroups from all the devices and populate event value array which will have per instance data.
- Collect NVLINK metric values per instance from all event value arrays.
Metric values can be correlated with the NVLINK records.
Refer CUPTI sample event_multi_gpu (https://docs.nvidia.com/cuda/cupti/index.html#r_samples) for collecting events from different devices/GPUs simultaneously.
Created attached patch for reference purpose to correlate NVLINK records (devices and ports) with NVLINK metrics for two devices.
nvlink_bandwidth.zip (2.6 KB)