I’m studying how Nsight Compute calculates Roofline Chart. So I check $CUDA_HOME/nsight-compute/sections/SpeedOfLight_RooflineChart.section. I understand all other header like Metrics
, MetricDefinitions
and so on, which can be found in Customization Guide.
However, I don’t understand what is the usage rule of Body
which I don’t find anything in that documentation. For example, the subitem Rooflines
is defined as follow:
Rooflines {
PeakWork {
ValueCyclesPerSecondExpression {
ValuePerCycleMetrics {
Name: "derived__sm__sass_thread_inst_executed_op_ffma_pred_on_x2"
}
CyclesPerSecondMetric {
Name: "sm__cycles_elapsed.avg.per_second"
}
}
}
PeakTraffic {
ValueCyclesPerSecondExpression {
ValuePerCycleMetrics {
Name: "dram__bytes.sum.peak_sustained"
}
CyclesPerSecondMetric {
Name: "dram__cycles_elapsed.avg.per_second"
}
}
}
Options {
Label: "Single Precision Roofline"
}
}
It seems that PeakWork
calculates throught(op/s) and PeakTraffic
bandwith(B/s). The quotient of them seems to be the arithemtic intensity.
My question is how ncu
command line knows it should divide PeakTraffic
by PeakWork
? The PeakWork
, ValueCyclesPerSecondExpression
, CyclesPerSecondMetric
and so on seems undefined in that documentation I provide.