Peak memory usage during TensorRT execution

Description

Hi,
I’m trying to extract peak memory usage information for a TensorRT engine execution.
I found out that in the log for building the engine with trtexec, there is some information like this.

[I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 59 MiB, GPU 13 MiB
  1. I wonder what exactly the peak memory usage means here and wonder if it can be considered as a reliable data to get peak memory of a TensorRT engine execution. I got confused because with some models, the peak memory reported is too small. For example, running trtexec with the attached ONNX model, which is Swin_b with fake quantization, reports only 4MB as GPU peak memory as below, whereas the model size itself is 340M.(approx. 85M considering quantization)
[I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
  1. If it’s not actually equivalent to the real peak memory usage, is there any alternatives?

Environment

TensorRT Version: 8.6.1
GPU Type: RTX A6000
Nvidia Driver Version: 510.108.03
CUDA Version: 12.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:23.07-py3

Relevant Files

ONNX model

Reproduce

To reproduce the result of Swin_b,

trtexec --onnx=[onnx_path] --int8
1 Like

Hi,

Peak memory usage reporting is not accurate in TRT 8.6, it will be fixed in upcoming major releases.

Thank you.

Are there any updates on this for newer versions of trt?
I tried using torch.cuda.max_memory_reserved(device) and torch.cuda.max_memory_allocated(device) for measuring peak memory usage while using a trt engine to do inference but the results are very inaccurate and vastly different than what I’m observing through nvidia-smi.

1 Like