Accuracy yolov3 in deepstream lower then darknet

I use yolov3. Trained with the parameter letter_box = 1 . In testing, I use the key -letter_box and I’m happy with the result.
When I use deeppstream I use the key maintain-aspect-ratio = 1 . And as a result, objects that come in contact with the bottom edge of the image are not marked as objects. If I use the key maintain-aspect-ratio = 0 , then these objects are marked with a bounding box. But at the same time, overall accuracy drops.

my code:

static std::vector<NvDsInferParseObjectInfo>
nonMaximumSuppression(const float nmsThresh, std::vector<NvDsInferParseObjectInfo> binfo, const uint& netW, const uint& netH)
{
    auto overlap1D = [](float x1min, float x1max, float x2min, float x2max) -> float {
        if (x1min > x2min)
        {
            std::swap(x1min, x2min);
            std::swap(x1max, x2max);
        }
        return x1max < x2min ? 0 : std::min(x1max, x2max) - x2min;
    };
    auto computeIoU
        = [&overlap1D](NvDsInferParseObjectInfo& bbox1, NvDsInferParseObjectInfo& bbox2) -> float {
        float overlapX
            = overlap1D(bbox1.left, bbox1.left + bbox1.width, bbox2.left, bbox2.left + bbox2.width);
        float overlapY
            = overlap1D(bbox1.top, bbox1.top + bbox1.height, bbox2.top, bbox2.top + bbox2.height);
        float area1 = (bbox1.width) * (bbox1.height);
        float area2 = (bbox2.width) * (bbox2.height);
        float overlap2D = overlapX * overlapY;
        float u = area1 + area2 - overlap2D;
        return u == 0 ? 0 : overlap2D / u;
    };

    std::stable_sort(binfo.begin(), binfo.end(),
                     [](const NvDsInferParseObjectInfo& b1, const NvDsInferParseObjectInfo& b2) {
                         return b1.detectionConfidence > b2.detectionConfidence;
                     });
    std::vector<NvDsInferParseObjectInfo> out;
    std::vector<NvDsInferParseObjectInfo> out_out;
    for (auto i : binfo)
    {
        bool keep = true;
        for (auto j : out)
        {
            if (keep)
            {
                float overlap = computeIoU(i, j);
                keep = overlap <= nmsThresh;
            }
            else
                break;
        }
        if (keep)
        {
            out.push_back(i);
            //out_out.push_back(i);
            float centerx = i.left + (i.width / 2);
            float centery = i.top + (i.height / 2);
            float side = std::min(std::min(netW - 1, netH - 1), std::max(i.width, i.height));
            i.left = std::max(centerx - (side / 2),float(0.0));
            i.top = std::max(centery - (side / 2),float(0.0));
            i.width = side;
            i.height = side;
            //i.left = clamp(i.left, 1, (netW - 1));
            //i.top = clamp(i.top, 1, (netH - 1));
            printf("points: %i, %i, %i, %i, wh: %i, %i\n",int(i.left), int(i.top), int(i.left+i.width), int(i.top+i.height), int(i.width), int(i.height));
            if ((i.left+i.width) > netW) {
                i.left = i.left + (netW - 1 - (i.left + i.width));
                printf(" change left %i\n", int(i.left));
            }
            if ((i.top+i.height) > (netH-1)) {
                i.top = i.top + (netH - 1 -(i.top + i.height));
                printf(" change top %i\n", int(i.top));
            }
            if (i.left < 0) {
                i.left = 0;
            }
            if (i.top < 0) {
                i.top = 0;
            }
            out_out.push_back(i);
        }
    }
    return out_out;
}

Hi,

A few of questions -

  1. Are you using the standard yolo models or a custom one ?
  2. If not, can this issue be reproduced with standard models ?
  3. Also, what are the changes in nms corresponding to ? Can you give more context regarding why these changes are needed ?
  1. I am using a custom model (yolov3 with a different number of classes).
  2. In the standard model, this is not so noticeable, because the accuracy of the model is lower (the model predicts a box that does not stop at the bottom edge of the picture, but the box is much smaller than the object)
  3. I didn’t quite understand why the next step (after the detectives classification) is lower accuracy if the aspect ratio = 1 and the box is not square (rectangular). For training classification, I used pictures with different aspect ratios. For a detection object with an aspect ratio = 1, I was able to change the NMS to get square boxes (not rectangular). But the problem is with the boxes that are in contact with the bottom edge of the picture.
  4. Now I use other networks to solve the problem, but I would like to figure out where I mislead.

Sorry i don’t quite understand the changes you seem to have done in the training phase. Can you add some pictures which help understand your problem ? A picture of what you are expecting and what output you are seeing would help.

  1. Which training framework have you used to train the custom model ?
  2. When you test inference in your training framework, how is the aspect ratio being handled for the test images ? What kind of padding has been used ?
  3. When you say box is not square, are you expecting the proposals given by the models to always be square boxes ?

Hi chernenko.vasiliy,

Could you reply with the questions at last comments? Or this is not an issue any more.
Please have the update then we can move this issue forward. Thanks