YOLOv5S model performance testing benchmark

LFYTMLY · April 10, 2025, 1:04pm

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) ：Jetson Orin Nano 4G/8G
• DeepStream Version：deepstream-6.2
• TensorRT Version 8.5.2.2

Can you share the performance testing benchmark of Jetson Orin Nano 4G/8G YOLOv5s (fp32/fp16/int8)

travis.whitten · April 10, 2025, 6:24pm

I can share YOLOv8-small-P2 1024x1024 on a Jetson Orin 64GB. P2 is a Pyramid Pooling layer addition that can also be enabled on YOLOv5, this layer helps to detect small objects but it comes at a computation expense. The performance logs below were done on Deepstream 6.4 as well which uses TensorRT 8.6.2.3. Apparently there is a big performance boost when using Deepstream 7.1 which uses TensorRT10.3.0.31.

FP32(50.2 FPS):

[04/10/2025-19:06:13] [I] === Performance summary ===
[04/10/2025-19:06:13] [I] Throughput: 52.432 qps
[04/10/2025-19:06:13] [I] Latency: min = 19.7522 ms, max = 20.2943 ms, mean = 19.9232 ms, median = 19.9209 ms, percentile(90%) = 19.9353 ms, percentile(95%) = 19.9408 ms, percentile(99%) = 19.9521 ms
[04/10/2025-19:06:13] [I] Enqueue Time: min = 1.24695 ms, max = 1.31693 ms, mean = 1.27 ms, median = 1.26599 ms, percentile(90%) = 1.28882 ms, percentile(95%) = 1.29651 ms, percentile(99%) = 1.31677 ms
[04/10/2025-19:06:13] [I] H2D Latency: min = 0.840942 ms, max = 0.866699 ms, mean = 0.852472 ms, median = 0.852844 ms, percentile(90%) = 0.858582 ms, percentile(95%) = 0.86084 ms, percentile(99%) = 0.864502 ms
[04/10/2025-19:06:13] [I] GPU Compute Time: min = 18.8267 ms, max = 19.3242 ms, mean = 18.9532 ms, median = 18.9506 ms, percentile(90%) = 18.9625 ms, percentile(95%) = 18.9667 ms, percentile(99%) = 18.985 ms
[04/10/2025-19:06:13] [I] D2H Latency: min = 0.067627 ms, max = 0.124146 ms, mean = 0.117593 ms, median = 0.116943 ms, percentile(90%) = 0.121582 ms, percentile(95%) = 0.12207 ms, percentile(99%) = 0.123474 ms
[04/10/2025-19:06:13] [I] Total Host Walltime: 3.05157 s
[04/10/2025-19:06:13] [I] Total GPU Compute Time: 3.03251 s

FP16(98.2 FPS):

[04/10/2025-19:05:03] [I] === Performance summary ===
[04/10/2025-19:05:03] [I] Throughput: 105.899 qps
[04/10/2025-19:05:03] [I] Latency: min = 10.0845 ms, max = 10.4376 ms, mean = 10.1814 ms, median = 10.1794 ms, percentile(90%) = 10.193 ms, percentile(95%) = 10.1981 ms, percentile(99%) = 10.2063 ms
[04/10/2025-19:05:03] [I] Enqueue Time: min = 1.18372 ms, max = 1.2373 ms, mean = 1.2027 ms, median = 1.19922 ms, percentile(90%) = 1.22137 ms, percentile(95%) = 1.22437 ms, percentile(99%) = 1.22864 ms
[04/10/2025-19:05:03] [I] H2D Latency: min = 0.671265 ms, max = 0.764832 ms, mean = 0.684166 ms, median = 0.682861 ms, percentile(90%) = 0.689697 ms, percentile(95%) = 0.693848 ms, percentile(99%) = 0.704102 ms
[04/10/2025-19:05:03] [I] GPU Compute Time: min = 9.33789 ms, max = 9.66742 ms, mean = 9.41251 ms, median = 9.41162 ms, percentile(90%) = 9.422 ms, percentile(95%) = 9.42456 ms, percentile(99%) = 9.4353 ms
[04/10/2025-19:05:03] [I] D2H Latency: min = 0.0671387 ms, max = 0.097229 ms, mean = 0.0847006 ms, median = 0.0845337 ms, percentile(90%) = 0.0860596 ms, percentile(95%) = 0.0865479 ms, percentile(99%) = 0.0872803 ms
[04/10/2025-19:05:03] [I] Total Host Walltime: 3.03118 s

I can’t seem to find the int8 calibration one I did but it will roughly half the latency mean of FP16 but it comes at a cost of a hit to accuracy 6-7% in my experience.

For a Jetson Orin Nano, depending on your number of feeds you are going to probably want to use a nano model if you have a lot of feeds. Otherwise, if using the standard Yolov5-small then you for sure need to stick to FP16 or INT8.

I pulled the latest Deepstream 7.1 Docker to do a test(nvcr.io/nvidia/deepstream:7.1-triton-multiarch) and got the following results for FP16:

[04/10/2025-19:08:33] [I] === Performance summary ===
[04/10/2025-19:08:33] [I] Throughput: 111.74 qps
[04/10/2025-19:08:33] [I] Latency: min = 9.53931 ms, max = 9.92157 ms, mean = 9.67231 ms, median = 9.67126 ms, percentile(90%) = 9.68427 ms, percentile(95%) = 9.6875 ms, percentile(99%) = 9.70447 ms
[04/10/2025-19:08:33] [I] Enqueue Time: min = 1.50317 ms, max = 1.5874 ms, mean = 1.53303 ms, median = 1.52954 ms, percentile(90%) = 1.55237 ms, percentile(95%) = 1.55933 ms, percentile(99%) = 1.57642 ms
[04/10/2025-19:08:33] [I] H2D Latency: min = 0.588745 ms, max = 0.685242 ms, mean = 0.604583 ms, median = 0.604248 ms, percentile(90%) = 0.610352 ms, percentile(95%) = 0.61377 ms, percentile(99%) = 0.620605 ms
[04/10/2025-19:08:33] [I] GPU Compute Time: min = 8.86865 ms, max = 9.16568 ms, mean = 8.92204 ms, median = 8.9209 ms, percentile(90%) = 8.92993 ms, percentile(95%) = 8.93347 ms, percentile(99%) = 8.9425 ms
[04/10/2025-19:08:33] [I] D2H Latency: min = 0.0664062 ms, max = 0.159454 ms, mean = 0.145687 ms, median = 0.145874 ms, percentile(90%) = 0.14917 ms, percentile(95%) = 0.150146 ms, percentile(99%) = 0.151611 ms
[04/10/2025-19:08:33] [I] Total Host Walltime: 3.03382 s
[04/10/2025-19:08:33] [I] Total GPU Compute Time: 3.02457 s

Only slightly more performant.

LFYTMLY · April 11, 2025, 1:29am

@travis.whitten
Okay, thank you for sharing.

system · April 25, 2025, 1:29am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom YOLOv4 Model Performance DeepStream SDK deepstream	2	625	April 18, 2022
PERF issues with DeepStream6.2 + YOLOv8 in Jetson Xavier DeepStream SDK jetson-inference , performance , yolo , fps , deepstream	8	979	September 26, 2023
DeepStream 5 vs 6 inference time and calculate fps in the pipeline on Jetson Nano DeepStream SDK	9	2841	January 14, 2022
Anyway to boost yolo performance on Jetson Orin? Jetson Orin Nano yolo	17	584	December 31, 2024
Inference with deepstream yolov5s-3.0 on 2 camera long delay (20-25s) DeepStream SDK	18	2278	October 12, 2021
Low Performance - Jetson Orin Nano Super TensorRT jetson , jetson-orin	3	170	February 24, 2025
Improved DeepStream for YOLO models DeepStream SDK	9	2297	March 25, 2022
Running YOloV4 on jetson Nano at Higher FPS? Jetson TX2 yolo	8	10460	October 18, 2021
How to adjust the paramerters to acclearte the yolov7 on deepstream? I got fps 8, i think it must be happened something wrong when i did DeepStream SDK	14	757	December 5, 2022
Deepstream yolov4 process multiple streams is slow DeepStream SDK	7	1373	November 30, 2021

YOLOv5S model performance testing benchmark

Related topics