Recently, I have used the TensorRT accelerated target detection model YOLOV5 to conduct forward reasoning acceleration of FP16 and INT8 respectively. The specific experimental results are as follows:
Hardware version information:
CUDA:10.0
CUDNN:7.6.5
TensorRT:7.0
System version:Centos 7
Display Card types:2080Ti
Reference Code:
1.GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
2.GitHub - wang-xinyu/tensorrtx: Implementation of popular deep learning networks with TensorRT network definition API
Question:
1.Model inference based on FP16/INT8 accelerated graphics card memory resources are not significantly reduced?
In addition, the following things are particularly interesting:When I switched to the following environment, my test results were as follows:
Question:
1.After using the new graphics driver, the graphics card resources required by the algorithm are reduced. Is this because the graphics driver is optimized and upgraded?
2.When TensorRT7.1 was used for model INT8 quantization, there was no detection result output in the process of forward reasoning. What is the reason for this?
Hardware version information:
CUDA:10.2
CUDNN:7.6.5
TensorRT:7.1
System version:Centos 7
Display Card types:2080Ti