A40 and 3090 GEMM performance test data

environment:ubuntu 22.04 、cublasMatmulBench(The test has locked the GPU frequency)
A40 DATA:
| INT8 | FP16 | TF32 | FP32 |
| 163 TFLOPS | 116 TFLOPS | 70 TFLOPS | 20.2 TFLOPS |

3090 DATA:
| TF32 | FP32 |
| 58.6 TFLOPS | 21.1 TFLOPS |

The question is :The data I measured myself is very different from the official PeakTFLOPS, what is the reason?