H100 PCIe hgemm cannot reach peak performance

kanmo · May 3, 2024, 4:12pm

Hi there,

I evaluated hgemm on H100 PCIe with Cutlass profiler, cuBLAS, and Triton, but the performance is up to about 400 TFlops. However, the peak performance shown in the whitepaper is 756 TFlops. I am not sure if the results of 400 TFlops are by design or evaluated incorrectly.

I run the evaluation with Driver 535.129.03, CUDA 12.2.1. The nvidia-smi shows that H100 runs with1 GHz, 350 W, and ~60 °C. But the peak frequency should be 1.75GHz. However, if I run hgemm with zero matrices as inputs, H100 can reach 1.75 GHz and 700+ TFlops.

The similar results are reported by others as well. Reddit - Dive into anything

Robert_Crovella · May 3, 2024, 8:16pm

It’s not realistic to expect to reach peak performance.

Running gemm or tensorcore codes will often cause the GPU to throttle its clocks to stay within an appropriate power envelope.

Yes, the input data pattern can affect power consumption, and therefore measured performance.

kanmo · May 3, 2024, 9:37pm

Hi Robert, thanks for your quick reply. Since I want to calibrate the H100 for further performance evaluations, are there any reference values of the reachable hgemm performance? I am afraid that I did not install the H100 in a correct way since it achieves less than 60% of the theoretical performance.

Robert_Crovella · May 6, 2024, 1:51pm

I’m not are of any published data in this area. It is common for GPUs to have varying percentages of achievement of performance, relative to peak.

system · May 20, 2024, 1:51pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
H100 HPL results Container: HPC	0	249	June 29, 2024
A100 PCIe HPL-AI poor performance GPU - Hardware hw , cuda	1	914	January 27, 2022
A100 PCIe HPL-AI poor performance NGC GPU Cloud hw , cuda	0	476	January 27, 2022
Proof achievable real peak performance for GPU CUDA Programming and Performance	1	576	July 20, 2021
Questions about whether HPL uses Tensor Core in A100 GPU-Accelerated Libraries benchmarks	3	819	April 27, 2023
Volta 100 LINPACK performance and energy-efficiency CUDA Programming and Performance	4	971	February 26, 2018
SGEMM performance of current Kepler GPUs? CUDA Programming and Performance	14	4731	July 25, 2014
HPL for V100 CUDA Programming and Performance	3	883	January 25, 2024
About GPU peak performance CUDA Programming and Performance	6	1381	August 29, 2023
Power throttling observed with GPU stress test that calls cublas CUDA Programming and Performance	11	1472	October 20, 2023

H100 PCIe hgemm cannot reach peak performance

Related topics