How to calculate kernel's arithmetic intensity with Nsight Compute Command Line Interface

Hello,
I’m just getting started with ncu. I know that for Nsight Compute with GUI, it can easily plot Roofline Model. However, for Nsight Compute with CLI (ncu), how can I get arithmetic intensity (FLOP/B). Since ncu collects various metrics, I guess arithmetic intensity of a kernel can be calculated by some specific metrics. Could anyone have ideas about this? Thanks in advance.
Update1: From Nsight Compute GUI, Arithmetic intensity = kernel work / kernel traffic. So the question is how to get peak work and peak traffic with ncu? These two metrics seems not native metrics.

You can refer to the .section files associated with the roofline model you are interested in for the input metrics and any multipliers for them. These files are located in your Nsight Compute installation directory under “sections”. After the first launch, they will also be deployed to your user Documents directory (so that they can be changed and extended by users for customization purposes).
E.g.

C:\Users\nvidia\Documents\NVIDIA Nsight Compute\2022.2.0\Sections\SpeedOfLight_RooflineChart.section

Start from a roofline chart entry of your choice and lookup any derived metrics on the top of the file.

  Label: "Floating Point Operations Roofline"
      AxisIntensity {
        Label: "Arithmetic Intensity [FLOP/byte]"
      }
      AxisWork {
        Label: "Performance [FLOP/s]"
      }
      Rooflines {
        PeakWork {
          ValueCyclesPerSecondExpression {
            ValuePerCycleMetrics {
              Name: "derived__sm__sass_thread_inst_executed_op_ffma_pred_on_x2"
            }
            CyclesPerSecondMetric {
              Name: "sm__cycles_elapsed.avg.per_second"
            }
          }
        }
        PeakTraffic {
          ValueCyclesPerSecondExpression {
            ValuePerCycleMetrics {
              Name: "dram__bytes.sum.peak_sustained"
            }
            CyclesPerSecondMetric {
              Name: "dram__cycles_elapsed.avg.per_second"
            }
          }
        }
        Options {
          Label: "Single Precision Roofline"
        }
      }

Note that in future versions of the tool, you will be able to print the roofline results directly on the command line, too.

1 Like

Thanks for your great reply! But I’m still getting confused with this .section file. Actually, I hardly understant this file and don’t know how it works. I google what *.section file is but don’t a clear definition about it.
so that they can be changed and extended by users for customization purposes Does this mean user can modify this *.section file, such as incorporating half floating-point add, mul and fma, so that Nsight Compute takes consideration of all floating-point operations? (As I know, When calculating arithmetic intensity, it doesn’t incorporate half floatint-point yet)

Section files are used as input by Nsight Compute to define which metrics are collected and how they are represented in the output. Users can modify them or add new ones to define their own collections of metrics.

Some sections files used specialized elements, such as roofline charts, which aren’t a simple collection of metrics (as e.g. in a Header element), but were each collected metric fills a certain purpose and takes part in derived computations to cerate the output (roofline). This means while you could add your own charts for the FP variants you mentioned, they are already shipped as part of Nsight Compute in their own sections files as SpeedOfLight_HierarchicalDouble/Half/Single/TensorRooflineChart.section. You can select them for collection on the CLI with the --section flag or in the UI’s Section/Rules Info tool window.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.