Different achieved values in Roofline

Hi I am trying to use Nsight Compute to make a hierarchical single precision roofline by ncu --kernel-name <kernel_name> --set roofline -o output ./<program>

The program I am running is a single-only application, so the achieved value in the “Hierarchical single precision roofline” should be as same as in the “floating point operation roofline”.

When I run it on H100 machine, the result looks good. The achieved value in “Hierarchical Single Precision” is as same as in the “floating point operation roofline”

But when I run it on A100, the achieved value in hierarchical one is totally different from the “floating point roofline”. That value is a strange number. Is there something wrong to calculate the achieved value in A100 machine or how it is calculated?

Nsight-Compute Version: 2021.3.1
CUDA Version: 11.5.119
GPU: A100(40GB), H100(H100 PCIe)

All related figure is attached.

Roofline.pdf (188.1 KB)

Support for Hopper H100 was added in Nsight Compute version 2022.3 (as part of CUDA 11.8).

It is not clear how you could use Nsight Compute 2021.3.1 for H100. Can you please reconfirm this?

Sorry for the confusion.
The driver and Nsight version is on A100 machine.
I have lost the access to H100 machine, but i remember its CUDA is 12.0.


Hmm, this seems like a bug to me. Are you able share the A100 report? Also, if you open in up in a 12.0 GUI, do you see the same strange value?