Improved DeepStream for YOLO models

Hi, I would like to share my work improving the DeepStream for YOLO models provided by NVIDIA.

Link: GitHub - marcoslucianops/DeepStream-Yolo: NVIDIA DeepStream SDK 6.0 configuration for YOLO models

Improvements on this repository

  • Darknet CFG params parser (no need to edit nvdsparsebbox_Yolo.cpp or another file)
  • Support for new_coords, beta_nms and scale_x_y params
  • Support for new models
  • Support for new layers
  • Support for new activations
  • Support for convolutional groups
  • Support for INT8 calibration
  • Support for non square models
  • Support for implicit and channel layers (YOLOR)
  • YOLOv5 6.0 native support
  • Initial YOLOR native support
  • Models benchmarks

Future updates

  • New documentation for multiple models
  • DeepStream tutorials
  • Native PP-YOLO support
  • GPU NMS
  • Dynamic batch-size

Tested models

Benchmarks

Board: NVIDIA GTX 1050 4GB (Mobile)

YOLOR-CSP performance comparison
DeepStream PyTorch
FPS (without display) 13.32 10.07
FPS (with display) 12.63 9.41
YOLOv5n performance comparison
DeepStream TensorRTx Ultralytics
FPS (without display) 110.25 87.42 97.19
FPS (with display) 105.62 73.07 50.37
More
DeepStream Precision Resolution IoU=0.5:0.95 IoU=0.5 IoU=0.75 FPS
(without display)
YOLOR-P6 FP32 1280 0.478 0.663 0.519 5.53
YOLOR-CSP-X* FP32 640 0.473 0.664 0.513 7.59
YOLOR-CSP-X FP32 640 0.470 0.661 0.507 7.52
YOLOR-CSP* FP32 640 0.459 0.652 0.496 13.28
YOLOR-CSP FP32 640 0.449 0.639 0.483 13.32
YOLOv5x6 6.0 FP32 1280 0.504 0.681 0.547 2.22
YOLOv5l6 6.0 FP32 1280 0.492 0.670 0.535 4.05
YOLOv5m6 6.0 FP32 1280 0.463 0.642 0.504 7.54
YOLOv5s6 6.0 FP32 1280 0.394 0.572 0.424 18.64
YOLOv5n6 6.0 FP32 1280 0.294 0.452 0.314 26.94
YOLOv5x 6.0 FP32 640 0.469 0.654 0.509 8.24
YOLOv5l 6.0 FP32 640 0.450 0.634 0.487 14.96
YOLOv5m 6.0 FP32 640 0.415 0.601 0.448 28.30
YOLOv5s 6.0 FP32 640 0.334 0.516 0.355 63.55
YOLOv5n 6.0 FP32 640 0.250 0.417 0.260 110.25
YOLOv4-P6 FP32 1280 0.499 0.685 0.542 2.57
YOLOv4-P5 FP32 896 0.472 0.659 0.513 5.48
YOLOv4-CSP-X-SWISH FP32 640 0.473 0.664 0.513 7.51
YOLOv4-CSP-SWISH FP32 640 0.459 0.652 0.496 13.13
YOLOv4x-MISH FP32 640 0.459 0.650 0.495 7.53
YOLOv4-CSP FP32 640 0.440 0.632 0.474 13.19
YOLOv4 FP32 608 0.498 0.740 0.549 12.18
YOLOv4-Tiny FP32 416 0.215 0.403 0.206 201.20
YOLOv3-SPP FP32 608 0.411 0.686 0.433 12.22
YOLOv3-Tiny-PRN FP32 416 0.167 0.382 0.125 277.14
YOLOv3 FP32 608 0.377 0.672 0.385 12.51
YOLOv3-Tiny FP32 416 0.095 0.203 0.079 218.42
YOLOv2 FP32 608 0.286 0.541 0.273 25.28
YOLOv2-Tiny FP32 416 0.102 0.258 0.061 231.36
3 Likes

Amazing! Really appreciate for your work on DeepStream! This is helpful, I’ll share this internally!

One question, in your test table, YoloV4/3/2 are from Darknet YOLO , right?

Yes

1 Like

Moved from CPU to GPU to get better performance

Results

4x faster inference in AGX using YOLOv5n model in FP16 mode

CPU YOLO Decoder GPU YOLO Decoder
image image

Update:

  • GPU YOLO Decoder (moved from CPU to GPU to get better performance) #138
  • Improved NMS #142

I’m using TAO to trained a yolo_v4_tiny model and exported to .etlt file, I didn’t see too much info for how to deployed it to Jetson Nano (also the DS6 samples does not include samples for YOLO), does the deepstream6 not support or?

For YOLOv4 trained in TAO you should you deepsteam_tao_apps

@marcoslucianops thanks.
Your work: DeepStream-Yolo is a improvement to deepstream_tao_apps, is that correct?

No, it’s a improvement of objectDetector_Yolo.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.