Peak memory usage during TensorRT execution

minkyu.kim2 · September 19, 2023, 1:05pm

Description

Hi,
I’m trying to extract peak memory usage information for a TensorRT engine execution.
I found out that in the log for building the engine with trtexec, there is some information like this.

[I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 59 MiB, GPU 13 MiB

I wonder what exactly the peak memory usage means here and wonder if it can be considered as a reliable data to get peak memory of a TensorRT engine execution. I got confused because with some models, the peak memory reported is too small. For example, running trtexec with the attached ONNX model, which is Swin_b with fake quantization, reports only 4MB as GPU peak memory as below, whereas the model size itself is 340M.(approx. 85M considering quantization)

[I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB

If it’s not actually equivalent to the real peak memory usage, is there any alternatives?

Environment

TensorRT Version: 8.6.1
GPU Type: RTX A6000
Nvidia Driver Version: 510.108.03
CUDA Version: 12.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:23.07-py3

Relevant Files

ONNX model

Reproduce

To reproduce the result of Swin_b,

trtexec --onnx=[onnx_path] --int8

spolisetty · September 21, 2023, 4:15am

Hi,

Peak memory usage reporting is not accurate in TRT 8.6, it will be fixed in upcoming major releases.

Thank you.

niloofar.zarif1 · October 16, 2024, 12:38am

Are there any updates on this for newer versions of trt?
I tried using torch.cuda.max_memory_reserved(device) and torch.cuda.max_memory_allocated(device) for measuring peak memory usage while using a trt engine to do inference but the results are very inaccurate and vastly different than what I’m observing through nvidia-smi.

Topic		Replies	Views
The same model consumes different sizes of GPU memory in different GPU TensorRT	8	1702	August 8, 2022
Very large CPU RAM Usage in TensorRT TensorRT	4	1165	December 13, 2021
The "GPU Compute Time" doesn't change, when setting different batch size TensorRT tensorrt	3	1170	July 8, 2022
TensorRT Inference Consuming Large Amount of System Resources TensorRT	1	575	July 5, 2022
GPU memory leak when using tensorrt with onnx model TensorRT tensorrt	4	1988	January 13, 2021
How does TensorRT use host memory (RAM) at runtime? TensorRT tensorrt , onnx	3	1725	August 3, 2023
Tensorrt Engine use too much memory TensorRT tensorrt	1	1575	December 13, 2021
Trtexec stuck,when convert onnx to rt TensorRT tensorrt	2	238	July 1, 2024
TensorRT 7.0 memory leak TensorRT tensorrt	3	1436	January 28, 2021
Why different input size causes different performance? TensorRT	4	769	October 12, 2021

Peak memory usage during TensorRT execution

Description

Environment

Relevant Files

Reproduce

Related topics