I want to make a roofline plot for publication, however I’m having trouble pulling out the proper counters from ncu.
To locate the point on the diagram I need:
The FMA performance counter expressed as FLOPS/S. At present, I can only find this expressed as a percentage with sm__pipe_fma_cycles_active.avg.pct_of_peak_sustained_elapsed
The bytes / second counter. Again, I have only been able to find this expressed as a percentage with gpu__compute_memory_throughput.avg.pc_of_peak_sustained_elapsed
Which counters should I be used for these values? Or to derive these values?
I also need values for the theoretical maximum values for both FMA performance and memory throughput. Is there a way for the device to report these values, or do I trust the advertised values on NVIDIA’s website?
The values reported for the tool are always for the current clock frequency of the GPU. This may vary from the clocks used or assumed for maximum values published elsewhere. ncu by default locks GPU clocks to their base. You can disable this and use nvidia-smi to lock clocks to a value you want. It is not recommended to measure these multi-pass metrics without locked clocks, as data from different passes may then not be consistent anymore.
For each roofline plotted in ncu, you can find the underlying metric names in the respective .section file. E.g., for the “Floating Point Operations Roofline”, everything is defined in SpeedOfLight_RooflineChart.section. This file is part of the ncu install dir and also deployed to your user’s Documents directory after first use.
The FMA performance counter expressed as FLOPS/S. At present, I can only find this expressed as a percentage
Each metric can be collected with different suffixes, as explained here. Note that the single-precision floating point roofline of ncu is not determined by this pipeline metric, but rather by counting various types of fma instructions, as detailed in the mentioned section file.
In the ncu 2024.3 UI, with the Metric Details tool window open, you can select an achieved value in a roofline chart and see the underlying metrics and formula for this value in the tool window.