Getting single precision utilization across entire application

I am trying to get the single precision fp utilization for an entire application. I am able to use nsight compute’s CLI to get this for each kernel using smsp__pipe_fma_cycles_active.avg.pct_of_peak_sustained_active
but there is no information on how these kernels relate to each other. Is there any easier way to approach this than to find the number of cycles for each of these kernels and weight them based on that? E.g. to get an application wide summary instead of kernel-specific summary.

Unfortunately, we do not provide application-wide analysis in Nsight Compute. The easiest to achieve this would likely be to collect only the two metrics you require (e.g. using the --metrics flag on the command line) and output the results using

--csv --page raw --units base

You should be able to easily drop this into a spreadsheet application or script to do the required calculations.

Alternatively, you can write custom python-based rules using the NvRules API that would be executed for each kernel when using the --apply-rules flag ( https://docs.nvidia.com/nsight-compute/CustomizationGuide/index.html#rule-system )

Rules can currently only access metrics for a single kernel (an “action” in the API) at a time, but you could store intermediate results e.g. locally on disk.