Cuda roofline analysis when kernel is below the roof

nokanaran · February 26, 2023, 8:31pm

Hi,

I have this roofline analysis from ncu, but I am not able to understand it fully. Is my kernel compute bound or memory bounded? It seems that none of them. The GPU is RTX2080.

jmarusarz · February 27, 2023, 9:00pm

Based on the location of your kernel on the roofline chart, you are not currently limited by the hardware limitations of the memory or compute subsystems. Not every kernel will be bound by the hardware limits.

nokanaran · February 28, 2023, 9:16pm

I’m curious to know what’s going on. Where can I find resources for this topic? If the hardware is not limiting, why doesn’t it run faster or reach the memory and processor limits?

jmarusarz · March 9, 2023, 9:42pm

There’s a good overview of roofline here Roofline Performance Model - NERSC Documentation In general, if you’re no where near the roofs, take a look at the other sections of the report. They may have information on what else could be limiting your performance.

rs277 · March 9, 2023, 10:04pm

The roofline analysis mentions double precision. You have an RTX2080, a Turing SM7.5 GPU.

Looking at the Programming guide here, for 7.X, 64-bit floating-point add, multiply, multiply-add are listed as having a throughput of 32 ops/cycle, except if you click on the “5” subscript next to it, you find it actually only has a throughput of 2 ops/cycle for SM7.5.

This could explain your poor performance.

Topic		Replies	Views
Incorrect Peak Performance Boundaries in Nsight Compute Roofline Charts Nsight Compute	4	886	July 5, 2022
Visualisation of Integer based Random Memory Access Kernel Nsight Compute	2	113	January 9, 2025
Strange position for achieved kernel in roofline diagram Nsight Compute	2	450	March 9, 2025
Nsight Compute-Roofline chart Nsight Compute	12	1944	September 20, 2024
Accelerating HPC Applications with NVIDIA Nsight Compute Roofline Analysis Technical Blog	2	356	September 25, 2024
SOL SM and Roofline seem to contradict? Nsight Compute cuda , ubuntu	3	677	October 12, 2021
Does Nsight Compute Roofline chart support a single achieved value for a single kernel? Nsight Compute	1	156	June 24, 2025
Roofline model's different chart's understanding Nsight Compute	0	1504	March 24, 2024
Understanding Memory Tables and Roofline Modell Nsight Compute	3	653	August 19, 2022
Calculation of Memory Bound nature vs Roofline numbers Nsight Compute	3	992	May 18, 2023

Cuda roofline analysis when kernel is below the roof

Related topics