Tune deepstream yoloV3_tiny parameters to perform as darknet version without deepstream

Hello,

Before starting with deepstream, I have been using YOLOv3_tiny from darknet and I managed to get pretty good results. However, when running the same model, with the same weights in deepstream I am getting completely different output. I have less bounding boxes, even though the probability threshold is the same, and the bounding boxes I get are way thinner and longer and than the ones I am getting with darknet.

Are there any parameters that I can modify in order to make YOLOv3_tiny output improve and be more similar to the one with darkent?

In this link (Question NVIDIA Forum - Google Drive) I share 2 frames with the detections from the to ways of running YOLOv3_tiny. In them is possible to see the differences I have explained before.

Thank you in advance.

anchors, mask are the same ?

nvdsparsebbox_Yolo.cpp

extern "C" bool NvDsInferParseCustomYoloV3Tiny(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    static const std::vector<float> kANCHORS = {
        10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319};
    static const std::vector<std::vector<int>> kMASKS = {
        {3, 4, 5},
        //{0, 1, 2}}; // as per output result, select {1,2,3}
        {1, 2, 3}};

    return NvDsInferParseYoloV3 (
        outputLayersInfo, networkInfo, detectionParams, objectList,
        kANCHORS, kMASKS);
}

The tensor output is in outputLayersInfo. You can compare it with darknet output tensor, so as to locate it is output parser problem or others (like network build, input pre-process …)

No, I have already set the correct anchors and masks.
The same used for training.

Is is possible that pre-process resize leads to the different result ? You can change the image resolution to be network weight/height, and then do inference by deepstream and darknet, and compare the result.

Another attention, in function “convertBBox()”, bbox coordinates are converted from “float” to “int”

Did you set config in INT8 mode? Can you try fp32 mode firstly?

Hi jgchxg9o,

Have you tried with our suggestion? Any result can be shared?

If I set the image size to a square size dividable by 32 it works fine (e.g. 608x608), but if I use the original image size in the streammux (e.g. 1280x720), I also get distorted bounding boxes. Any ideas why?

I use a customly trained yolo detector.

I tried to reproduce the problem with the official nvidia samples, but it didn’t occur then. So there must be some parameters, that are not correctly set, I guess… Do you have an idea, where?

I think the reason, why I can’t reproduce it with your official examples is, because you don’t have very thin and small bounding boxes.

Although I could reproduce another thing, that seems related.

When I resize the image in the streammux (see case A) to a square image size dividable by 32, I get much more detected bounding boxes, as if I let the inference plugin do it (see case B).

For reproducing:
case A:

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=416
height=416
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

case B:

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=1
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=416
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

Apart from the odd bounding boxes, I also experienced, that I get much less detections, when not resizing the image in the streammux.

I have checked what ChrisDing said and in the “convertBBox()” function and printed the different values of the BB at different points. Right after the detection in the convertBBox function and the other point before my plugin will use the detections. By doing this I figure out that the rescaling of the BB only works for square images as all the dimensions of the BB (x,y,w,h) are scaled with the same ratio.

For example: Using an input image of size 1280x720 a BB obtained has these dimensions → (x1:1210,y1:383,w1:16,h1:86). While the print inside the convertBBox showed that this BB has these dimensions → (x0:575, y0:182,w0:8,h0:41). By calculating all the ratios of between the two BB in the different dimensions I obtain this: x1/x0=2.1043, y1/y0=2.1043, w1/w0=2, h1/h0=2.0976

This makes that the BB are not well rescaled as the ratio in X is 2.1 and the ratio in Y is 1.1. Is there any point where I can check/modify the scaling ratio?

Apart from this, I experience the same as rog07o4z, that when using a square image size dividable by 32, like 1280x1280, I get a good scaling of the BB and more than when using the image size of 1280x720.

Hi. @ChrisDing, @kayccc, Have you verified, what I reported? Still, i think, that the resizing operation in the inference plugin should be responsible the mentioned problem.