Incorrect Peak Performance Boundaries in Nsight Compute Roofline Charts


I’m having trouble understanding why when plotting the roofline chart on compute for a kernel running on an A100, the FP64 peak performance boundary is set at around 7.5 TFLOP/s (V100 peak) and not around the actual 9.5 TFLOP/s it should use as limit. I’m attaching a screenshot from an analysis on my kernel. You can see the tool correctly identifies the A100 GPU but the peak performance as I hover over the plateau is wrong. I’m using Nsight compute version 2021.3. I tried this on version 2022.2 but when I hover over the boundaries the pop-up tooltip is not working (Linux version).

Is there something wrong here or does it set the peak performance as 7.5 because that’s the nominal peak and not the theoretical?

Thanks for your help.

The roofline is constructed based on the clock rate at which the application was run (because that sets the upper limit).

What is the clock_rate for your profiling run? You can find this under Device Attributes on the Session page in the Nsight Compute UI.