In tensorRT accelerated model, the decline of video memory utilization was not obvious ？

1965281904 · November 3, 2020, 7:22am

Recently, I have used the TensorRT accelerated target detection model YOLOV5 to conduct forward reasoning acceleration of FP16 and INT8 respectively. The specific experimental results are as follows:

Hardware version information：

CUDA：10.0
CUDNN：7.6.5
TensorRT：7.0
System version：Centos 7
Display Card types：2080Ti

Reference Code：

1.GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
2.GitHub - wang-xinyu/tensorrtx: Implementation of popular deep learning networks with TensorRT network definition API

Question：

1.Model inference based on FP16/INT8 accelerated graphics card memory resources are not significantly reduced?

In addition, the following things are particularly interesting:When I switched to the following environment, my test results were as follows:

Question：

1.After using the new graphics driver, the graphics card resources required by the algorithm are reduced. Is this because the graphics driver is optimized and upgraded?

2.When TensorRT7.1 was used for model INT8 quantization, there was no detection result output in the process of forward reasoning. What is the reason for this?

Hardware version information：

CUDA：10.2
CUDNN：7.6.5
TensorRT：7.1
System version：Centos 7
Display Card types：2080Ti

Topic		Replies	Views
The inference speed of yolov5 tensorrt has little difference between int8 and fp16 TensorRT tensorrt , cuda	1	1492	September 8, 2022
The problem of time - consuming jump appears in TensorRT 7.0 accelerated yolov5s model reasoning TensorRT	5	582	August 20, 2020
For the pre-processing of YOLOv5, how do I speed up the pre-processing of YOLOv5 through TensorRT？ TensorRT	2	2078	September 1, 2020
High image res & low no of channels -> really bad speed TensorRT	10	917	June 14, 2022
How are CUDA resources allocated under dual processes? TensorRT	3	361	June 21, 2022
TensorRT --fp16 pre and post Int8 quantization TensorRT cudnn	1	94	September 2, 2024
YoloV4 slower in INT8 than FP16 TensorRT	5	1477	June 5, 2021
Inference time increases in for loop TensorRT	2	372	February 6, 2023
Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization TensorRT	1	534	June 6, 2023
Yolov4 TensorRT slower than Yolov4 darknet TensorRT	6	3422	September 1, 2020

In tensorRT accelerated model, the decline of video memory utilization was not obvious ？

Related topics