Hi, I have coverted the darknet yolov3 model to both fp16 tensorrt model and int8 tensorrt model with coco dataset as calibration data . However, the size of the two generated trt models (yolov3-fp16.trt & yolov3-int8.trt) is very similar and the inference speed is also quite the same. I am on a Jetson NX2, TensorRT 126.96.36.199, CUDA 10.2.
Have you check the performance of fp16 and int8 model?
Would you mind to share the profiling result with us first?
Please noticed that the de-serialized file contains TensorRT implementation and information.
It doesn’t guarantee that INT8 will generate a smaller file than FP16.
But you will get a much less memory usage and a much better performance when using INT8.