in trt5, serialize trt with shell:
./trtexec --onnx=example.onnx --saveEngine=example.trt --fp16 --batch=5
then deserialize example.trt in my c++ project and run inference with variable batch, the time consuming is direct ratio with batch num.
For example,doing inference two times in one program life time, first with one batch and time consuming is 10ms, second with two batch which time consuming is 20ms.
in trt7.1, serialize trt with shell:
./trtexec --onnx=example.onnx --saveEngine=example.trt --minShapes=input:1x3x128x128 --optShapes=input:4x3x256x256 --maxShapes=input:5x3x256x256
For example, i program to do inference with trt7.1 two times, first with 1 batch tand ime consuming is 20ms, second with 2 batch which also consuming 20ms.
I’m curious whether there is possible to do variable batch inference like using trt5 which time consumption gradually increases with batch number?