CUPTI 12.8, profiling of NVLink metrics using PRofiler Host API and Range Profiler API

According to the CUDA 12.8 release notes, “profiling of NVLink metrics is now supported using profiler host API and range profiler APIs.”

Would it be possible to get a relevant example?

I tested to retrieve NVLink metrics like nvlrx__bytes.sum on a DCX A100 server using the range_profiling and userrange_profiling samples.

In range_profiling, I encountered:
cuptiProfilerHostConfigAddMetrics() failed with error(999): CUPTI_ERROR_UNKNOWN

In userrange_profiling, I encountered:
NVPM_RawMetricsConfig_AddMetrics() with error NVPA_STATUS_ERROR

Neither of these error codes provide any useful information.

Other metrics, such as PCIe metrics, are being retrieved successfully—only NVLink metrics seem to be the issue.


After changing CUpti_ProfilerType from CUPTI_PROFILER_TYPE_RANGE_PROFILER to CUPTI_PROFILER_TYPE_PM_SAMPLING, this error disappeared.

So, is there a cupti profiler type that supports nvlink like pm sampling?

Thanks for trying out CUPTI profiling APIs.

NVLink metrics are supported only with the new Range Profiler APIs (cuptiRangeProfiler*). Legacy cuptiProfiler* API do not support profiling NVlink metrics (which are device level metrics).

The range_profiling sample uses the new Range Profiler and new Host APIs, so this is expected to support NVLink metric collection. Note that this sample don’t have any nvlink specific operations so the counter values will be zero.

We are able to repro the issue you have reported, to unblock, you can skip querying the counter availability image when collecting NVLink metrics data and it will work. Pass an empty vector as input for counter availability image.

Note: collecting both device and context level metrics at the same time is not supported.

The user_range_profiling sample uses old cuptiProfiler* APIs so it expected to not work for NVLink metrics.