I’m trying to extract peak memory usage information for a TensorRT engine execution.
I found out that in the log for building the engine with
trtexec, there is some information like this.
[I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 59 MiB, GPU 13 MiB
- I wonder what exactly the peak memory usage means here and wonder if it can be considered as a reliable data to get peak memory of a TensorRT engine execution. I got confused because with some models, the peak memory reported is too small. For example, running
trtexecwith the attached ONNX model, which is Swin_b with fake quantization, reports only 4MB as GPU peak memory as below, whereas the model size itself is 340M.(approx. 85M considering quantization)
[I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 4 MiB
- If it’s not actually equivalent to the real peak memory usage, is there any alternatives?
TensorRT Version: 8.6.1
GPU Type: RTX A6000
Nvidia Driver Version: 510.108.03
CUDA Version: 12.1
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:23.07-py3
To reproduce the result of Swin_b,
trtexec --onnx=[onnx_path] --int8