Peak memory usage during TensorRT execution


I’m trying to extract peak memory usage information for a TensorRT engine execution.
I found out that in the log for building the engine with trtexec, there is some information like this.

[I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 59 MiB, GPU 13 MiB
  1. I wonder what exactly the peak memory usage means here and wonder if it can be considered as a reliable data to get peak memory of a TensorRT engine execution. I got confused because with some models, the peak memory reported is too small. For example, running trtexec with the attached ONNX model, which is Swin_b with fake quantization, reports only 4MB as GPU peak memory as below, whereas the model size itself is 340M.(approx. 85M considering quantization)
[I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
  1. If it’s not actually equivalent to the real peak memory usage, is there any alternatives?


TensorRT Version: 8.6.1
GPU Type: RTX A6000
Nvidia Driver Version: 510.108.03
CUDA Version: 12.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

ONNX model


To reproduce the result of Swin_b,

trtexec --onnx=[onnx_path] --int8


Peak memory usage reporting is not accurate in TRT 8.6, it will be fixed in upcoming major releases.

Thank you.