The inference speed of yolov5 tensorrt has little difference between int8 and fp16

407531794 · September 6, 2022, 3:25pm

Description

I use yolov5 model from GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite,
and use the code from tensorrtx/yolov5 at master · wang-xinyu/tensorrtx · GitHub to convert the pytorch model to .wts, then convert to fp16 or int8 tensorrt model.
But I found that the inference speed and memory consumption of the fp16 and int models are approximative, unlike the obvious gap between fp16 model and fp32 model, there is only about a 10% improvement from fp16 to int8 model.
And I try differcent cards 2080ti/t4 and different model yolov5/simple classification model, results are same.

Environment

TensorRT Version: 7.0
GPU Type: 2080ti/T4
Nvidia Driver Version:
CUDA Version: 10.0
CUDNN Version: 7.6
Operating System + Version: ubuntu16.04
Python Version (if applicable): 3.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag):

Relevant Files

models:

convert code:

Steps To Reproduce

download yolov5 model(any version is OK) from Releases · ultralytics/yolov5 · GitHub

2.follow tensorrtx/yolov5 at master · wang-xinyu/tensorrtx · GitHub
" How to Run, yolov5s as example"
" INT8 Quantization"

serialize the int8/fp16 model and test, compare the speed of int8/fp16 model

spolisetty · September 8, 2022, 11:06am

Hi,

Could you please try on the latest TensorRT version 8.4.3.
Please share with us the ONNX model and trtexec --verbose logs for better debugging.

Thank you.

Topic		Replies	Views
YoloV4 slower in INT8 than FP16 TensorRT	5	1475	June 5, 2021
Want to halve inference time TensorRT	7	766	December 25, 2023
Little performance difference between int8 and fp16 on RTX2080 TensorRT	4	2541	July 5, 2021
Yolov3 int8 on tensorrt 7.1.0.16 Jetson Xavier NX tensorrt	4	850	October 18, 2021
YoloV4 int8 conversion issue TensorRT tensorrt	1	514	January 11, 2022
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1931	November 11, 2021
In tensorRT accelerated model, the decline of video memory utilization was not obvious ？ Deep Learning (Training & Inference)	0	218	November 3, 2020
QAT int8 TRT engine slower than fp16 TensorRT tensorrt , pytorch , python , onnx	3	2253	January 6, 2022
Inference time increases in for loop TensorRT	2	371	February 6, 2023
Huge speed difference between engines built from scratch and engines built from onnx TensorRT	9	985	January 7, 2022

The inference speed of yolov5 tensorrt has little difference between int8 and fp16

Description

Environment

Relevant Files

Steps To Reproduce

Related topics