RTX3070 performance with TensorRT

Description

I use RTX3070 to do a ResNet18 inference and get a throughput of <300/s at batch 50. At the same environment, the RTX2080Ti can reach 500/s.
In this a normal result? Or the drivers should be optimized?

Environment

TensorRT Version: 7.2.1
GPU Type: RTX3070
Nvidia Driver Version:455.45
CUDA Version: 11.1
CUDNN Version: 8.0.5
Operating System + Version: Ubuntu 16.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

NA

Steps To Reproduce

NA

Hi @gahang,

Could you please share the model and script file to reproduce this issue?
Also if possible, could you please share the verbose logs and profiler output of both the test runs?

Thanks