How to use custom object detector i.e nvinfer in ds-example

@hidayat.rhman

There is a guide of deploying YOLOv4 to DeepStream that could be a reference for you.

However, the way to parse model output for YOLOv5 may be very different. There are some hints that may be helpful.

Hint1: Contents of output may be different

In the guide for YOLOv4, there are 2 types of information in the output: 1) x1, y1, x2, y2 of bounding boxes and 2) Confidences of each bounding box throughout all classes.
The “YOLOv5” ouput may include different types of information. For example, it may include 3 types of data: bounding boxes, location confidences and class confidences

Hint2: Shape of output may be different

[batch_size, num_boxes, 1, 4] and [batch_size, num_boxes, num_classes] are output shapes of YOLOv4. But the “YOLOv5” output may be [batch_size, num_boxes, 5 + num_classes] or may be separated: [batch_size, num_boxes, 5] and [batch_size, num_boxes, num_classes].

Hint3: Values of output may be different

All coordinates (x1, y1 and x2, y2) from YOLOv4 are normalized within range [0.0, 1.0] but coordinates from the “YOLOv5” may not be normalized.

Hint4: Bounding box format may be different

Bounding box for YOLOv4 is [x1, y1, x2, y2] where (x1, y1) is top-left coordinate of the box, and (x2, y2) is bottom-right coordinate.
But bounding box from the “YOLOv5” could be like this: [x-center, y-center, width, height].

1 Like