Tflops performance of 3080 or 3090

Hi when I am using 3080 or 3090 with cuda11.1 and cudnn8.0.5, the tflops of them cannot reach 30/36Tflops. The used framework is Pytorch.
What is the reason and how the performance can be reached?

if you aim for 180/200 Tflops, you have to use tensorflow