File size vs. batch size for TensorRT serialized engine files


I converted a model from onnx to a serialized TensorRT engine file, each time with a different explicit batch size. I expected that the bigger the batch size, the bigger the engine file will be- but it turns out to contradict my expectation.
All exported model files have the same precision (float32) and same “DLA support mode”.
For example, the file sizes according to batch size:
Batch size 1 --> 280 MB
Batch size 10 --> 247 MB
Batch size 20 --> 182 MB

When I benchmark these engine files using trtexec, the inference times make sense according to the batch size, so it seems it was indeed exported with the wanted batch size.

The main question: What causes the file sizes to be smaller when the batch size became bigger? Is this an “expected behavior”?


TensorRT Version: 7.1
Environment for converting the onnx—>trt engine: Jetson Xavier

Relevant Files

I can’t share the engine models.

DLA support is currently limited to networks running in either FP16 or INT8 mode.
It is not supported in network with float 32 precision.
Could you provide logs with --verbose enabled in trtexec?