Description
There seems to be a increase in GPU memory use running model inference on Ampere based cards, relative to older architectures. Specifically across multiple onnx models I’m seeing around 3x more memory used on a 3080 versus a 1080 when running trtexec using the same trt/cuda versions.
Are there any general notes I can look at to help diagnose this problem?
Environment
TensorRT Version: 8.5.0 (also 7.2.1)
GPU Type: 3080
Nvidia Driver Version: 510.108.03
CUDA Version: 11.6
CUDNN Version: 8.6
Operating System + Version: Ubuntu 20.04
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:22.09-py3