YoloV4 BBox confidence values are wrong

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) → NVIDIA GeForce GTX 1650
• DeepStream Version → 6.1
• JetPack Version (valid for Jetson only) NA
• TensorRT Version → TensorRT 8.2.5.1
• NVIDIA GPU Driver Version (valid for GPU only) NVIDIA driver 515
• Issue Type( questions, new requirements, bugs) bugs
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello,
I’ve Followed this updated guide of YoloV4: YoloV4 Manual

Also the: Following Repo

The application is working, but ObjectList.size() reports 0 objects. Which means I’m not getting any bounding boxes as seen here:

After some debugging, turns out the confidence levels the network obtained is always less than 0.1

After Printing Object confidence output against the configured threshold, here’s the Deepstream Output:

maxProb=0.00624847 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00129032 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.71065e-05 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.72853e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000273466 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00196075 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00395584 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00308037 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00318336 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00321007 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00319672 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.0032177 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00320625 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00320625 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00319481 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00319672 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00317001 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00313568 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00318336 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00424194 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00452423 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000881195 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.46031e-05 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.36442e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=4.13656e-05 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000129342 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000457287 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000349283 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000365019 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000364304 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000365973 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000362635 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000362635 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000363111 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000366449 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000365496 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000365496 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000365973 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000362158 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000420094 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.000361919 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0.00010705 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=4.17233e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.96046e-08 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.96046e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=3.8743e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.78165e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.48363e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.36442e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.36442e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.36442e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.30481e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.30481e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.36442e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.36442e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.42402e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.42402e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.42402e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.36442e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=7.09295e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=8.46386e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.2517e-06 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.78814e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=0 detectionParams.perClassPreclusterThreshold[maxIndex]=0
maxProb=0 detectionParams.perClassPreclusterThreshold[maxIndex]=0
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.78814e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.78814e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=1.19209e-07 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4
maxProb=5.96046e-08 detectionParams.perClassPreclusterThreshold[maxIndex]=0.4

Find some useful inputs for your inspection below:

nvdsinfer_yolov4parser.cpp File:

/*
 * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <algorithm>
#include <cassert>
#include <cmath>
#include <cstring>
#include <fstream>
#include <iostream>
#include <unordered_map>
#include "nvdsinfer_custom_impl.h"

static const int NUM_CLASSES_YOLO = 80;

float clamp(const float val, const float minVal, const float maxVal)
{
    assert(minVal <= maxVal);
    return std::min(maxVal, std::max(minVal, val));
}

extern "C" bool NvDsInferParseCustomYoloV4(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList);


/* YOLOv4 implementations */
static NvDsInferParseObjectInfo convertBBoxYoloV4(const float& bx1, const float& by1, const float& bx2,
                                     const float& by2, const uint& netW, const uint& netH)
{
    NvDsInferParseObjectInfo b;
    // Restore coordinates to network input resolution

    float x1 = bx1 * netW;
    float y1 = by1 * netH;
    float x2 = bx2 * netW;
    float y2 = by2 * netH;

    x1 = clamp(x1, 0, netW);
    y1 = clamp(y1, 0, netH);
    x2 = clamp(x2, 0, netW);
    y2 = clamp(y2, 0, netH);

    b.left = x1;
    b.width = clamp(x2 - x1, 0, netW);
    b.top = y1;
    b.height = clamp(y2 - y1, 0, netH);

    return b;
}

static void addBBoxProposalYoloV4(const float bx, const float by, const float bw, const float bh,
                     const uint& netW, const uint& netH, const int maxIndex,
                     const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
{
    NvDsInferParseObjectInfo bbi = convertBBoxYoloV4(bx, by, bw, bh, netW, netH);

    std::cerr << "b.left="<<bbi.left<< " b.width=" << bbi.width << " b.top=" << bbi.top  << " b.height=" << bbi.height;

    if (bbi.width < 1 || bbi.height < 1) return;

    bbi.detectionConfidence = maxProb;


    bbi.classId = maxIndex;
    std::cerr << "maxProb="<<maxProb;
    std::cerr << "maxIndex="<<maxIndex;

    binfo.push_back(bbi);
}

static std::vector<NvDsInferParseObjectInfo>
decodeYoloV4Tensor(
    const float* boxes, const float* scores,
    const uint num_bboxes, NvDsInferParseDetectionParams const& detectionParams,
    const uint& netW, const uint& netH)
{
    std::vector<NvDsInferParseObjectInfo> binfo;

    uint bbox_location = 0;
    uint score_location = 0;
    for (uint b = 0; b < num_bboxes; ++b)
    {
        float bx1 = boxes[bbox_location];
        float by1 = boxes[bbox_location + 1];
        float bx2 = boxes[bbox_location + 2];
        float by2 = boxes[bbox_location + 3];

        float maxProb = 0.0f;
        int maxIndex = -1;

        for (uint c = 0; c < detectionParams.numClassesConfigured; ++c)
        {
            float prob = scores[score_location + c];
            if (prob > maxProb)
            {
                maxProb = prob;
                maxIndex = c;
            }
        }

        std::cerr << "maxProb="<<maxProb<< " detectionParams.perClassPreclusterThreshold[maxIndex]=" << detectionParams.perClassPreclusterThreshold[maxIndex] << std::endl;

        if (maxProb > detectionParams.perClassPreclusterThreshold[maxIndex])
        {
            addBBoxProposalYoloV4(bx1, by1, bx2, by2, netW, netH, maxIndex, maxProb, binfo);
        }

        bbox_location += 4;
        score_location += detectionParams.numClassesConfigured;
    }

    return binfo;
}

extern "C" bool NvDsInferParseCustomYoloV4(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
    {
        std::cerr << "WARNING: Num classes mismatch. Configured:"
                  << detectionParams.numClassesConfigured
                  << ", detected by network: " << NUM_CLASSES_YOLO << std::endl;
    }

    std::vector<NvDsInferParseObjectInfo> objects;

    const NvDsInferLayerInfo &boxes = outputLayersInfo[0]; // num_boxes x 4
    const NvDsInferLayerInfo &scores = outputLayersInfo[1]; // num_boxes x num_classes

    // 3 dimensional: [num_boxes, 1, 4]
    assert(boxes.inferDims.numDims == 3);
    // 2 dimensional: [num_boxes, num_classes]
    assert(scores.inferDims.numDims == 2);

    // The second dimension should be num_classes
    assert(detectionParams.numClassesConfigured == scores.inferDims.d[1]);
    
    uint num_bboxes = boxes.inferDims.d[0];

    // std::cout << "Network Info: " << networkInfo.height << "  " << networkInfo.width << std::endl;

    std::vector<NvDsInferParseObjectInfo> outObjs =
        decodeYoloV4Tensor(
            (const float*)(boxes.buffer), (const float*)(scores.buffer), num_bboxes, detectionParams,
            networkInfo.width, networkInfo.height);

    objects.insert(objects.end(), outObjs.begin(), outObjs.end());

    objectList = objects;
    
    // std::cerr << "After postprocessing objects.size()" << objects.size() << std::endl;
    // std::cerr << "After postprocessing objectList.size()" << objectList.size() << std::endl;

    return true;
}
/* YOLOv4 implementations end*/


/* Check that the custom function has been defined correctly */
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV4);

config_YoloV4.txt FIle:

################################################################################
#
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8), model-file-format
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes
#
# Optional properties for detectors:
#   cluster-mode(Default=Group Rectangles), interval(Primary mode only, Default=0)
#   custom-lib-path
#   parse-bbox-func-name
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
model-engine-file=../../data/models/YoloV4/yolov4_-1_3_640_640_dynamic.engine
labelfile-path=../../data/models/YoloV4/labels.txt
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0
## 0=Group Rectangles, 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV4
custom-lib-path=../nvdsinfer_yolov4parser/libnvds_YoloV4Parser.so
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
nms-iou-threshold=0.6
pre-cluster-threshold=0.4

Hi @yousef.hesham1 , Which application do you use with your model in deepstream? Could you provide us your stream? Thanks

Hello @yuweiw, Thanks for the quick response!
I’m using the apps/­deepstream-imagedata-multistream Sample application
If by stream you mean the video Im testing with, I’m using the following 2 files as part of 6 stream test:

file:///opt/nvidia/deepstream/deepstream-6.1/samples/streams/sample_1080p_h264.mp4
file:///opt/nvidia/deepstream/deepstream-6.1/samples/streams/sample_1080p_h265.mp4

And they produce no bounding boxes. Could it be an ONNX conversion issue?

====》Could it be an ONNX conversion issue?
It might be. You can try to test it with our yolov4 trt model and postprocess function. Thanks
Please refer the link below to download our yolov4 model.
https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps

@yuweiw I tried the tao deployable yolov4 model (yolov4_resnet18_395.etlt) and here’s the output Im getting:

Can you assist with this please? Also I can upload the converted ONNX I’ve used when I first encountered this issue if it needs inspection.

From the video you attached, we can see that the lables show out. So maybe the way you draw the bbox is wrong. You can refer the draw_bounding_boxes fucntion in deepstream-imagedata-multistream file. Also you can draw anything in the picture as a test.

Hello @yuweiw
I understand that this function is only used to draw on image data extracted from the object meta, then is saved on desk.

And the actual drawing of the ObjectList bounding Box are drawn using NVIDIA plugins in the pipeline. Please correct me if I’m wrong.

And anyways I’ve double checked the function and it’s exactly the same as the original sample.

Could it be a conversion issue?

====>Please correct me if I’m wrong.
You are right. We use the osd plugin to draw.
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvdsosd.html
===>Could it be a conversion issue?
I think you used tao deployable yolov4 model, so there is no conversion.
Could you provide your change about deepstream-imagedata-multistream? Thanks

I did use the tao deployable yolov4 model and the results are shown above in the comments.
Not much changes were made other than adding tracker config.

The reason may be your post-process algorithm. I use the yolov4 model in our tao app to run the deepstream-imagedata-multistream demo. When use your post-process, it cannot draw the bbox, the object recognized is 0. But when I use the post-process in TAO app, it works well. You can try this from the link below:
https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/post_processor/nvdsinfer_custombboxparser_tao.cpp