I converted a model from onnx to a serialized TensorRT engine file, each time with a different explicit batch size. I expected that the bigger the batch size, the bigger the engine file will be- but it turns out to contradict my expectation.
All exported model files have the same precision (float32) and same “DLA support mode”.
For example, the file sizes according to batch size:
Batch size 1 --> 280 MB
Batch size 10 --> 247 MB
Batch size 20 --> 182 MB
When I benchmark these engine files using trtexec, the inference times make sense according to the batch size, so it seems it was indeed exported with the wanted batch size.
The main question: What causes the file sizes to be smaller when the batch size became bigger? Is this an “expected behavior”?
TensorRT Version: 7.1
Environment for converting the onnx—>trt engine: Jetson Xavier
I can’t share the engine models.