Question about Trt7.1 variable batch inference

in trt5, serialize trt with shell:
./trtexec --onnx=example.onnx --saveEngine=example.trt --fp16 --batch=5
then deserialize example.trt in my c++ project and run inference with variable batch, the time consuming is direct ratio with batch num.
For example,doing inference two times in one program life time, first with one batch and time consuming is 10ms, second with two batch which time consuming is 20ms.

in trt7.1, serialize trt with shell:
./trtexec --onnx=example.onnx --saveEngine=example.trt --minShapes=input:1x3x128x128 --optShapes=input:4x3x256x256 --maxShapes=input:5x3x256x256
For example, i program to do inference with trt7.1 two times, first with 1 batch tand ime consuming is 20ms, second with 2 batch which also consuming 20ms.

I’m curious whether there is possible to do variable batch inference like using trt5 which time consumption gradually increases with batch number?


Do you need a dynamic input shape?
If the input is fixed to 256x256, please try our implicit mode with maximum batch size == 5.

$ /usr/src/tensorrt/bin/trtexec --onnx=example.onnx --maxBatch=5


Thanks, I will try it later!
Btw, should i export my onnx model with dynamic batch size, for example set batch num = -1 in onnx model?


YES. This will allow you to use different batchsize with TensorRT much easier.