Can We use CUPTI for Run-Time Analysis of Cuda Applications for GPU Metrics

I want to develop a tool for GPU metric analysis in Cuda executable programs.Tool’s purpose is get information about user given metric at run time of cuda executable per seconds and for every different kernel, or use an existing tool. I thought I could use CUPTI for this issue but I m not sure, What Can I do now ?

CUPTI does not support metric sampling. It can be used to collect metrics which are aggregated values for each kernel run or for an application range.

You can look at the Nsight Systems GPU metric sampling feature.

As far as I understand, the CUPTI Profiling API can gather performance metrics during kernel execution. I need to retrieve metric values on a per-second basis using this API. I came across the callback_profiling code in the CUPTI examples, which allows me to extract the value of a specific metric. However, I’m unable to continuously retrieve this metric value on a per-second basis. Let me show example usage of callback_profiling code :


For example, how can I print the metric I showed above to the console per second?
I’m also curious about the two files, Simple_Cupti.data and Simple_Cupti.dataSB that are generated once the application has been executed. Do these files contain information that reflect the metric’s status every second?
My last question is that can use ranges to collect metrics at specified time ?

You cannot use CUPTI APIs for this.

No ranges cannot be used for this.