RTX 3090 Peak Performance

uniadam · December 14, 2021, 10:32pm

I am seeing that the peak performance of RTX 3090 for FP32 and FP16 is like this:

[FP16 (half) performance
35.58 TFLOPS (1:1)

FP32 (float) performance
35.58 TFLOPS](NVIDIA GeForce RTX 3090 Specs | TechPowerUp GPU Database)

So it seems that they are equal. My question is about the performance of multiplication in FP16 and accumulating in FP32. Is it same as the FP32 peak performance? (I was expected to see FP16 with accumulation in FP16 is sometimes doubling the performance of FP16 with accumulation in FP32.)

Robert_Crovella · December 14, 2021, 10:51pm

You may wish to review the whitepaper for GA102.

The only place I know of where that happens is in the tensorcore (TC) unit(s) which are used for matrix-matrix multiplication. So for TC, and specifically for matrix-matrix multiplication, the throughput is not the same as either the FP16 or FP32 throughput. Based on the whitepaper, the peak theoretical TC throughput for the FP16/FP32 path should be around 70TF (for RTX3090).

For TC/matrix-matrix multiply, that is correct, and also covered in the whitepaper (e.g. Table 2).

Topic		Replies	Views
Question about tensor cores performance CUDA Programming and Performance	3	628	October 12, 2021
TF32 TFLOPs of GeForce RTX 3090 vs A40 CUDA Programming and Performance	2	2519	September 11, 2023
GV100 10x FP performance over RTX 3090 CUDA Programming and Performance	2	1061	October 9, 2020
GU H100/L40S Performance CUDA Programming and Performance	4	338	November 25, 2024
Question regarding Tensor Cores/GV100 CUDA Programming and Performance	8	2532	August 12, 2017
Cudnn TF32 performs no better than FP32 on RTX3090 TensorRT	1	686	January 15, 2021
Ada GeForce (RTX 4090) FP8 cuBLASLt performance GPU-Accelerated Libraries cublas	7	11819	November 2, 2023
How to calculate the Tensor Core FP16 performance of H100? CUDA Programming and Performance	9	5823	August 14, 2024
Multiplication in Half and Accumulation in Single CUDA NVCC Compiler	0	524	July 3, 2022
3080 fp16 performance poor? TensorRT	5	2012	October 12, 2021

RTX 3090 Peak Performance

Related topics