How to Understand Peak Traffic in the Roofline Model?

zy.lu · September 6, 2022, 7:51am

Is the peak traffic value in the Roofline Model the peak bandwidth only related to GPU or achieved by the program?

In general, peak traffic in the Roofline Model is peak bandwidth only related to the hardware . It is also explained in ProfilingGuide as GPU memory transfer speed.

However, peak traffic in the default section file is

. lts__t_bytes.sum.peak_sustained/lts__cycles_elapsed.avg.per_second looks like the peak bandwidth that the program reached. In practice, different programs do have different peak traffic.

So how to understand peak traffi ?
If we need the peak traffic only related to GPU, what can we do?

jmarusarz · September 6, 2022, 6:55pm

The peak is only related to the GPU, not varying based on the activity of the workload. The names of the metrics may be confusing, but the peak_sustained metric is basically the “peak value that the GPU could possibly sustain regardless of workload”. It’s hardcoded per GPU. Not related to anything sustained by the application. The per_second metric is just used to calculate the cycles/sec (clockspeed) of the GPU which is needed for the various ratios.

zy.lu · September 7, 2022, 7:45am

Why the roofline models of vectoradd and nbody have different peak traffic？They both run on RTX 3090.

jmarusarz · September 7, 2022, 2:16pm

At the top of the report details page there is a “SM Frequency” metric. Can you check what that value is for the 2 results? It’s likely they are different and the roof is calculated based on the observed frequency during the run.

zy.lu · September 8, 2022, 7:51am

Yes, I see the “SM Frequency” and “DRAM Frequency” are different between different between different applications. Is the “DRAM Frequency” equal to dram_cycles_elapsed.avg.per_second ?

Sanjiv.Satoor · September 8, 2022, 9:04am

Yes.

If you hover the mouse over the metric in the Details page it shows the metric name.

zy.lu · September 9, 2022, 8:31am

Get it！Is there any way to get the maximum frequency of SM, L1 cache, L2 cache, and DRAM? Which frequency is the GPU Boost Clock written in the white paper?

felix_dt · September 9, 2022, 8:35am

You can use the nvidia-smi utility to query your GPUs possible clock rates for SM and Memory, see nvidia-smi clocks -h for more details. You can use nvidia-smi --lock-gpu-clocks/--reset-gpu-clocks in combination with ncu --clock-control none to have the clocks set external to Nsight Compute.

system · September 23, 2022, 8:36am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding Memory Tables and Roofline Modell Nsight Compute	3	637	August 19, 2022
Why the Compute Throughput's value is different from the actual Performance / Peak Performance Nsight Compute cuda , kernel , nsight , profiling	7	2998	October 28, 2022
What does Body and its subitems mean in some official section files? Nsight Compute	4	520	February 19, 2024
Incorrect Peak Performance Boundaries in Nsight Compute Roofline Charts Nsight Compute	4	870	July 5, 2022
Confused about the L1/SMEM BW reported by Nsight-Compute Hierarchical Roofline plots Nsight Compute	13	1677	August 17, 2023
About the flops in ncu report Nsight Compute	11	3847	July 29, 2024
Making a roofline plot: understanding the raw counters Nsight Compute	4	163	September 20, 2024
How are pct_of_peak metrics calculated? Nsight Compute	6	141	April 30, 2025
Kernel pipeline slows gradually CUDA Programming and Performance	11	64	December 21, 2024
Nsight Compute Clock Speed During Profiling Nsight Compute	4	1768	March 31, 2022

How to Understand Peak Traffic in the Roofline Model?

Related topics