Increased GPU memory footprint with Ampere architecture


There seems to be a increase in GPU memory use running model inference on Ampere based cards, relative to older architectures. Specifically across multiple onnx models I’m seeing around 3x more memory used on a 3080 versus a 1080 when running trtexec using the same trt/cuda versions.

Are there any general notes I can look at to help diagnose this problem?


TensorRT Version: 8.5.0 (also 7.2.1)
GPU Type: 3080
Nvidia Driver Version: 510.108.03

CUDA Version: 11.6
CUDNN Version: 8.6
Operating System + Version: Ubuntu 20.04
Baremetal or Container (if container which image + tag):


Because we have developed more kernels for Ampere GPUs, more GPU memory will be utilized.
Also need more memory for cuDNN and other libraries like cuBLAS on newer GPU.

You can try limiting the memory by using –-memPoolSize=<pool_spec> option in trtexec

  • –-memPoolSize=<pool_spec>: Specify the maximum size of the workspace that tactics are allowed to use, as well as the sizes of the memory pools that DLA will allocate per loadable.

Thank you.