TF32 TFLOPs of GeForce RTX 3090 vs A40

benuix · September 8, 2023, 1:45pm

Hi NVIDIA team,

I believe there is an error in the TF32 tensor TFLOPs computation in the GA102 family:
https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

In particular, from page 25 we see that GA10x SM can execute:
128 FMA FP16 ops (dense) = 64 FMA FP32 ops

When computing the Peak TF32 Tensor TFLOPS for 3090 the numbers (page 44-45) are fine using the formula
FMA * Tensor Cores * GPU Boost Clock (MHz) i.e.
64 * 328 * 1695 = 35.6 TFLOPs

however the numbers don’t work for the A40 (page 15-16) of the same family i.e.
64 * 336 * 1740 = 37.4 TFLOPs which is half of the reported 74.8 TFLOPs.

Is there something I’m missing or the reported numbers are wrong?

Thank you,
Antonis

Robert_Crovella · September 8, 2023, 2:08pm

There is nothing wrong with the numbers. The TC units in each of those 2 GPUs do not necessarily act in precisely the same way. The detail description of the differences is unpublished, as far as I know.

benuix · September 11, 2023, 5:41pm

Thank you for the quick reply!

Topic		Replies	Views
A40 and 3090 GEMM performance test data Frameworks (archived) cuda , ubuntu	0	1037	April 11, 2023
gigaflops CUDA Programming and Performance	16	16537	September 11, 2008
GTX280/GT200 GPU Can you really reach 1TFLOP/s? CUDA Programming and Performance	6	10202	June 19, 2008
GTX285 vs C1060 vs GTX480 GFLOP/s ? CUDA Programming and Performance	1	17335	June 25, 2010
Question about computing GFLOPS Do fabs and a=-b instructions count? CUDA Programming and Performance	13	4601	February 12, 2010
GPU single and double precision FLOPs CUDA Programming and Performance	1	7537	June 16, 2009
Theoretical TFLOPS for FP16, BF16 and TF32 for tensor and non-tensor GPU-Accelerated Libraries	4	6023	June 21, 2022
How many float operations per cycle? CUDA Programming and Performance	3	4744	January 14, 2009
How to get more Gflops ? :) CUDA Programming and Performance	21	27754	September 12, 2008
Confusion about GFlops of c1060/c2050 CUDA Programming and Performance	4	14446	November 29, 2010

TF32 TFLOPs of GeForce RTX 3090 vs A40

Related topics