Serialized trt engine file is much larger than the original model file?


Models converted from onnx or caffe are much larger than the original ones.

for example, resnet 18 is 45M in onnx or caffe, when converted to tensorrt engine and the serialized binary files are 69M and 89M respectively.


TensorRT Version: 6.0.5
GPU Type: GTX2070, GTX1080TI
Nvidia Driver Version: 410.73
CUDA Version: 10.0
CUDNN Version: 7.3.1
Operating System + Version: Ubuntu 16.04
Python Version (if applicable): 3.7.4
TensorFlow Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): Baremetal

Relevant Files

Models can be found in model zoo or any github. You can choose any other models if you want.
TensorRT-6.0.5 is needed.

Steps To Reproduce

cd TensorRT-6.0.5
./bin/trtexec --model=resnet-18.caffemodel --deploy=deploy.prototxt --saveEngine=resnet18-caffe.trt --output=prob
./bin/trtexec --onnx=resnet18.onnx --saveEngine=resnet18-onnx.trt
ls -l

Hi @jockeypan,
TensorRT makes tradeoffs between memory usage and performance that can increase the size of the memory reserved by the plan files. This is something that is being actively investigated and improved upon.
However, looking at the verbose log might help understand where the memory usage is coming from.


1 Like