Difference between running the inference with trtexec and tensorrt python API

Hi,

We reuse the calibration cache from here and the model used in jetson-benchmark.
Be able to get 1088 fps on INT8+Batchsize16+Workspace2048.

Please check following for the instructions:

$ wget https://www.dropbox.com/s/ck9e40b57rd5o14/yolov3-tiny-416.zip
$ unzip yolov3-tiny-416.zip
$ wget https://raw.githubusercontent.com/jkjung-avt/tensorrt_demos/master/yolo/calib_cache/calib_yolov3-tiny-int8-416.bin

YOLOv3_Tiny_benchmark.patch (8.7 KB)

$ /usr/src/tensorrt/bin/trtexec --onnx=yolov3-tiny-416-bs16.onnx --best --workspace=2048 --saveEngine=yolov3-tiny-416-bs16.trt --calib=calib_yolov3-tiny-int8-416.bin
$ git apply YOLOv3_Tiny_benchmark.patch
$ python3 onnx_to_tensorrt.py

$ python3 onnx_to_tensorrt.py
Reading engine from file yolov3-tiny-416-bs16.trt
Running inference on image dog.jpg…
FPS: 1088.407443846851
[[125.69800719 217.86413197 254.43353728 296.92521182]
[475.38425396 79.1671842 194.08492385 86.92553251]] [0.80192175 0.70910078] [16 2]

Thanks.