DeepStream/Triton Server - YOLOv9 QAT

Levi_Pereira · April 12, 2024, 5:52pm

Just Sharing!

For those planning to use Deepstream/Triton Server with YOLOv9, I highly recommend quantizing (finetune) the model for improved performance in TensorRT. I have created a repository that adds the quantization feature specifically for TensorRT.

Below, you’ll find a performance table summarizing the benefits of this approach:

Perfomance / Accuracy

TensorRT version: 10.0.0

Model

YOLOv9-C-converted

Accuracy Report

YOLOv9-C

Evaluation Results

Eval Model	AP	AP50	Precision	Recall
Origin (Pytorch)	0.529	0.699	0.743	0.634
INT8 (TensorRT)	0.527	0.695	0.746	0.627

Evaluation Comparison

Eval Model	AP	AP50	Precision	Recall
INT8 (TensorRT) vs Origin (Pytorch)
	-0.002	-0.004	+0.003	-0.007

Latency/Throughput Report using only TensorRT

Device

GPU
Device	NVIDIA GeForce RTX 4090
Compute Capability	8.9
SMs	128
Device Global Memory	24207 MiB
Application Compute Clock Rate	2.58 GHz
Application Memory Clock Rate	10.501 GHz

Latency/Throughput

Model Name	Batch Size	Latency (99%)	Throughput (qps)	Total Inferences (IPS)
(FP16)	1	1.25 ms	803	803
	4	3.37 ms	300	1200
	8	6.6 ms	153	1224
	12	10 ms	99	1188

INT8	1	0.99 ms	1006	1006
	4	2.12 ms	473	1892
	8	3.84 ms	261	2088
	12	5.59 ms	178	2136

Latency/Throughput Comparison

Model Name	Batch Size	Latency (99%)	Throughput (qps)	Total Inferences
INT8 vs FP16
	1	-20.8%	+25.2%	+25.2%
	4	-37.1%	+57.7%	+57.7%
	8	-41.1%	+70.6%	+70.6%
	12	-46.9%	+79.8%	+78.9%

Full Report

yingliu · April 13, 2024, 7:01am

Thanks for sharing to the community!

Levi_Pereira · April 29, 2024, 9:02pm

Added New improvements.

Evaluation Results

Activation SiLU

Eval Model	AP	AP50	Precision	Recall
Origin (Pytorch)	0.529	0.699	0.743	0.634
INT8 (Pytorch)	0.529	0.702	0.742	0.63
INT8 (TensorRT)	0.529	0.696	0.739	0.635

Activation ReLU

Eval Model	AP	AP50	Precision	Recall
Origin (Pytorch)	0.519	0.69	0.719	0.629
INT8 (Pytorch)	0.518	0.69	0.726	0.625
INT8 (TensorRT)	0.517	0.685	0.723	0.626

Topic		Replies	Views
DeepStream / Triton-Server and YOLOv7 QAT DeepStream SDK	2	417	February 18, 2024
INT8 Yolo model conversion led to accuracy drop in deepstream DeepStream SDK	22	3476	October 2, 2021
Yolov8 nvinferserver fp16 not working DeepStream SDK	8	963	September 20, 2023
DeepStream / Triton-Server / YOLOv9 / YOLOv7 DeepStream SDK	1	704	March 18, 2024
DeepStream YOLO Series End2End Models DeepStream SDK	7	358	October 19, 2024
Possible Solutions to INT64 clamping accuracy drop DeepStream SDK tensorrt	11	476	March 11, 2024
Yokov7 in Deepstream give ouputs less than when running by using Tensorrt python API? DeepStream SDK	16	455	June 25, 2023
Deepstream / Triton Server - YOLOv7 DeepStream SDK	8	3147	November 15, 2022
PERF issues with DeepStream6.2 + YOLOv8 in Jetson Xavier DeepStream SDK jetson-inference , performance , yolo , fps , deepstream	8	947	September 26, 2023
INT8 Calibration with DS 6.3 worse than with DS 6.0 DeepStream SDK tensorrt , jetson , deepstream , tensorrt-model-optimizer	18	46	November 13, 2024

DeepStream/Triton Server - YOLOv9 QAT

Perfomance / Accuracy

Model

Accuracy Report

Evaluation Results

Evaluation Comparison

Latency/Throughput Report using only TensorRT

Device

Latency/Throughput

Latency/Throughput Comparison

Added New improvements.

Evaluation Results

Activation SiLU

Activation ReLU

Latency

Related topics