Hi I am trying to use Nsight Compute to make a hierarchical single precision roofline by ncu --kernel-name <kernel_name> --set roofline -o output ./<program>
The program I am running is a single-only application, so the achieved value in the “Hierarchical single precision roofline” should be as same as in the “floating point operation roofline”.
When I run it on H100 machine, the result looks good. The achieved value in “Hierarchical Single Precision” is as same as in the “floating point operation roofline”
But when I run it on A100, the achieved value in hierarchical one is totally different from the “floating point roofline”. That value is a strange number. Is there something wrong to calculate the achieved value in A100 machine or how it is calculated?
Nsight-Compute Version: 2021.3.1
CUDA Version: 11.5.119
GPU: A100(40GB), H100(H100 PCIe)
All related figure is attached.
Thanks.
Roofline.pdf (188.1 KB)