Getting single precision utilization across entire application

steven69 · May 17, 2019, 3:06pm

I am trying to get the single precision fp utilization for an entire application. I am able to use nsight compute’s CLI to get this for each kernel using smsp__pipe_fma_cycles_active.avg.pct_of_peak_sustained_active
but there is no information on how these kernels relate to each other. Is there any easier way to approach this than to find the number of cycles for each of these kernels and weight them based on that? E.g. to get an application wide summary instead of kernel-specific summary.

felix_dt · May 21, 2019, 9:15am

Unfortunately, we do not provide application-wide analysis in Nsight Compute. The easiest to achieve this would likely be to collect only the two metrics you require (e.g. using the --metrics flag on the command line) and output the results using

--csv --page raw --units base

You should be able to easily drop this into a spreadsheet application or script to do the required calculations.

Alternatively, you can write custom python-based rules using the NvRules API that would be executed for each kernel when using the --apply-rules flag ( https://docs.nvidia.com/nsight-compute/CustomizationGuide/index.html#rule-system )

Rules can currently only access metrics for a single kernel (an “action” in the API) at a time, but you could store intermediate results e.g. locally on disk.

Topic		Replies	Views
Get Nvprof-like information by Nsight Nsight Compute	6	501	June 27, 2023
How to get all kernels name? Nsight Compute	5	1632	July 24, 2020
NVIDIA Nsight Compute to profile the whole application Nsight Compute	4	604	May 26, 2021
How can I profile both kernel and cuda APIs hardware usage and application total duration Nsight Compute	5	422	March 27, 2024
Total kernel execution time Nsight Compute	2	957	December 15, 2021
Extract data from roofline plot Nsight Compute kernel	2	163	March 5, 2025
Nsight Compute Overall GPU and BW Utilization info Nsight Compute	1	432	June 8, 2020
Get the total L2 Cache data volume during the complete execution of a CUDA Program Nsight Compute cuda	1	540	April 10, 2023
Is not there a replay-mode option? Nsight Compute	1	801	July 24, 2019
How to profile a part of kernel function with Nsight Compute Nsight Compute	3	511	April 10, 2024

Getting single precision utilization across entire application

Related topics