How to get CUPTI metric values for a CUDA program with more than one kernel execution?

manredd · July 2, 2014, 4:14pm

Dear CUDA CUPTI developers,

I have a question related to the use of CUPTI API. Please let me know if this is not the right forum for this question.

Please forgive me for this verbose post.

I am using one of the samples ‘callback_metric’ in the directory extras/CUPTI/sample for studying CUPTI API. This sample shows how to use both the callback and metric APIs to record the metric’s events during the execution of a simple kernel, and then use those events to calculate the metric value.

I am using a GPU QuadroK4000 and CUDA6.0 toolkit for my experimental runs on UBUNTU Linux 14.04.

The restriction here is that the CUDA program profiled for CUPTI metrics and events must have just one kernel execution.

I would like to know if this example can be reused to get CUPTI metrics for a CUDA program with more than one kernel execution. For example, if I use the same approach (as in the sample callback_metric) to run the simpleCUFFT example or the radixsortThrust example, I get the following error:

error: too many events collected, metric expects only 2
error: too many events collected, metric expects only 2
…

Eventually I get other errors and my application (which collects all the Capability 3.x metrics) grinds to a halt. The problem is more than one kernel execution (which violates possibly one current limitation).

One approach is as follows:

// setup launch callback for event collection
// allocate space to hold all the events needed for the metric
// get the number of passes required to collect all the events
// needed for the metric and the event groups for each pass
execute_kernel_A(…);
// use all the collected events to calculate the metric value

// setup launch callback for event collection
// allocate space to hold all the events needed for the metric
// get the number of passes required to collect all the events
// needed for the metric and the event groups for each pass
execute_kernel_B(…);
// use all the collected events to calculate the metric value

But this approach gives me metric values for the individual kernel executions and not for the CUDA program (containing these kernel executions). Also, it is incorrect to sum the metric values for all the kernel executions.

Could you please let me know if there is a simple method to get these metric values for a CUDA program containing more than one kernel execution?

Best Regards
Ravi

Topic		Replies	Views
using CUPTI in application code CUDA Programming and Performance	0	570	August 1, 2017
Measure metrics from a CUDA binary? CUPTI – CUDA Profiler Tools Interface cuda , kernel	2	776	October 12, 2021
CUPTI callback_metric sample CUDA Programming and Performance	0	683	December 29, 2013
Get event metrics per thread or warp via CUPTI CUDA Programming and Performance	1	1449	June 14, 2013
Documenting the number of Kernel passes required for CUPTI metrics/events CUPTI – CUDA Profiler Tools Interface	1	549	May 7, 2020
CUPTI problem, cuptiEventGroupReadEvent() returns me a value buffer with all 0 CUDA Programming and Performance	2	849	March 30, 2016
Repeating Kernel launch for Cupti Events CUPTI – CUDA Profiler Tools Interface cuda	1	702	November 11, 2020
Command line profiler and metrics CUDA Programming and Performance	0	714	January 30, 2012
Nsight VS CUPTI Nsight Visual Studio Edition	2	2262	January 9, 2014
Loss of CUPTI counter cookbook CUDA Programming and Performance	4	1262	September 12, 2013

How to get CUPTI metric values for a CUDA program with more than one kernel execution?

Related topics