Description
I use yolov5 model from GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite,
and use the code from tensorrtx/yolov5 at master · wang-xinyu/tensorrtx · GitHub to convert the pytorch model to .wts, then convert to fp16 or int8 tensorrt model.
But I found that the inference speed and memory consumption of the fp16 and int models are approximative, unlike the obvious gap between fp16 model and fp32 model, there is only about a 10% improvement from fp16 to int8 model.
And I try differcent cards 2080ti/t4 and different model yolov5/simple classification model, results are same.
Environment
TensorRT Version: 7.0
GPU Type: 2080ti/T4
Nvidia Driver Version:
CUDA Version: 10.0
CUDNN Version: 7.6
Operating System + Version: ubuntu16.04
Python Version (if applicable): 3.7
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag):
Relevant Files
models:
convert code:
Steps To Reproduce
- download yolov5 model(any version is OK) from Releases · ultralytics/yolov5 · GitHub
2.follow tensorrtx/yolov5 at master · wang-xinyu/tensorrtx · GitHub
" How to Run, yolov5s as example"
" INT8 Quantization"
- serialize the int8/fp16 model and test, compare the speed of int8/fp16 model