In tensorRT accelerated model, the decline of video memory utilization was not obvious ?

Recently, I have used the TensorRT accelerated target detection model YOLOV5 to conduct forward reasoning acceleration of FP16 and INT8 respectively. The specific experimental results are as follows:

image

Hardware version information:

CUDA:10.0
CUDNN:7.6.5
TensorRT:7.0
System version:Centos 7
Display Card types:2080Ti

Reference Code:

1.GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
2.GitHub - wang-xinyu/tensorrtx: Implementation of popular deep learning networks with TensorRT network definition API

Question:

1.Model inference based on FP16/INT8 accelerated graphics card memory resources are not significantly reduced?


In addition, the following things are particularly interesting:When I switched to the following environment, my test results were as follows:

image

Question:

1.After using the new graphics driver, the graphics card resources required by the algorithm are reduced. Is this because the graphics driver is optimized and upgraded?

2.When TensorRT7.1 was used for model INT8 quantization, there was no detection result output in the process of forward reasoning. What is the reason for this?

Hardware version information:

CUDA:10.2
CUDNN:7.6.5
TensorRT:7.1
System version:Centos 7
Display Card types:2080Ti