Why the Peak FLOP/s in Nsight Compute is much less than white paper provided?

weipenghui_666 · February 9, 2023, 9:19am

For NVIDIA A100 PCIE 40GB GPU: (I read the nvidia-ampere-architecture-whitepaper, for GA100)
I have using Nsight compute to get the Roofline and FLOP/s of a CUDA kernel.
The Peak FP64 Performance (FLOP/s) = 5.27TFLOP/s
**Peak FP64 Performance in white paper = 9.7TFLOPS ** (This is obtained under GPU boost clock, I found the boost clock of A100=1410MHz)

The Peak FP64 Performance (FLOP/s) (Not app /algorithm achieved )in Nsight compute is how to get ?

rs277 · February 9, 2023, 6:11pm

Usually Nsight Compute locks the GPU clock to the base frequency, in order to provide run to run repeatability.

To remove this lock, change the bottom setting in this window:

weipenghui_666 · February 10, 2023, 2:17am

Thank you a lot. Now the FLOP/s in Nsight is almost same with White paper said.
I have found that NVIDIA A100’s GPU Boost Clock=1410 MHz (ampere-architecture-white paper: page-36), Boost Clock can be get by this:

    cudaDeviceProp prop;
    CHECK_ERROR(cudaGetDeviceProperties(&prop, 1));
    clock_t clock_rate = prop.clockRate; // Clock frequency in kilohertz

How to get the base clock frequency of A100 GPU or others ?

rs277 · February 10, 2023, 2:29am

nvidia-smi -q will give you the base clock, in the “Clocks” section.

One source is the Techpowerup database. Here is the spec for the 80GB PCIe version of the A100, the base clocks vary between models.

weipenghui_666 · February 10, 2023, 2:51am

Thank you, very useful info, especially the Techpowerup database website.

Topic		Replies	Views
Incorrect Peak Performance Boundaries in Nsight Compute Roofline Charts Nsight Compute	4	842	July 5, 2022
Question about Roofline of TensorCore GEMM Nsight Compute	3	1461	August 7, 2024
Why the Compute Throughput's value is different from the actual Performance / Peak Performance Nsight Compute cuda , kernel , nsight , profiling	7	2616	October 28, 2022
Nsight Compute Clock Speed During Profiling Nsight Compute	4	1628	March 31, 2022
About the flops in ncu report Nsight Compute	11	3513	July 29, 2024
Making a roofline plot: understanding the raw counters Nsight Compute	4	142	September 20, 2024
NSight : How to calculate FLOP/s that's close to achieved FLOP/s CUDA Programming and Performance	3	2906	October 4, 2017
How to measure FLOPs of a cuda kernel function by using Nsight-Compute on A100 GPU? Nsight Compute kernel	2	540	August 16, 2024
How to calculate the Tensor Core FP16 performance of H100? CUDA Programming and Performance	9	5483	August 14, 2024
Nsight Compute: The frequency is not fixed Nsight Compute	4	1098	May 19, 2024

Why the Peak FLOP/s in Nsight Compute is much less than white paper provided?

Related topics