TensorRT poor inference performance on Ampere

Description

Trying to perform real-time inference of video using an SSD model, getting 65-70ms inference time per frame compared to 20ms on Geforce RTX 2080 / Quadro RTX5000.
Tested both Ubuntu 18.04 and Ubuntu 20.10.

Environment

TensorRT Version: 7.2.2.3
GPU Type: GeForce RTX 3070
Nvidia Driver Version: 460.39
CUDA Version: 11.2 / 11.1
CUDNN Version: 8.0.5
Operating System + Version: Ubuntu 18.04 + Ubuntu 20.10

Relevant Files

Steps To Reproduce

Using a modified python sample script in: /usr/src/tensorrt/samples/python/uff_ssd with an added opencv camera stream.

Hi, Request you to share the model, script, profiler and performance output so that we can help you better.

Alternatively, you can try running your model with trtexec command
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
or view these tips for optimizing performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html

Thanks!