Different result with custom tiny_yolo model in DeepStream vs cv2.dnn module

I trained the yolo_tiny model use darknet. First tested on Ubuntu in OpenCv using the dnn module. Use cv2.dnn.readNetFromDarknet (configPath, weightsPath)

Then I took the same weight and cfg files, and example from deepstream sdk and Jetson Nano, I run the same video file and get completely different results, the objects find no different frames, confuses the classes, and in general the result is worse on deepstream.

What can be wrong?

Hi,

One possible reason is the model-color-format.
OpenCV describe image in the BGR order but the default model-color-format in YOLO sample is RGB.
Would you mind to update the parameter into BGR(model-color-format=1) to see if helps first?

More color format option can be found here:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/DeepStream_Development_Guide/baggage/group__gstreamer__nvinfer__context.html#ga5a15c6c94b72e12f26a198e6ac20c4f1

Thanks.

Hello AastaLLL!
I work together with dimaretunskiy and I’m helping him make a custom C app that uses Yolo v3 tiny on the Jetson Nano.

First of all, thanks for the reply! Setting the correct model color mode did indeed help. Our model was BGR, but even then the results were drastically different from what we got from Opencv. Our app seemed to be much worse, even with the tracker in the pipeline.

Yesterday I found out that I’d missed the anchor/mask settings in the yolo module.
We had them in the cfg file, but from what I understood, the custom yolo implementation overrides the settings from the config.

Specifically I mean this function here:

extern "C" bool NvDsInferParseCustomYoloV3Tiny(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    static const std::vector<float> kANCHORS = {
        10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319};
    static const std::vector<std::vector<int>> kMASKS = {
        {3, 4, 5},
        //{0, 1, 2}}; // as per output result, select {1,2,3}
        {1, 2, 3}};

    return NvDsInferParseYoloV3 (
        outputLayersInfo, networkInfo, detectionParams, objectList,
        kANCHORS, kMASKS);
}

Our cfg seems to use 345 and 012 for mask values, so when I changed the commented-out part to {0, 1, 2} the results have gotten much, much better.

Now, Opencv’s results are comparable, but they are still different somehow. We can probably work with that, but we wanted to ask if maybe we’d missed something else? I’ve done everything from Custom_YOLO_Model_in_the_DeepStream_YOLO_App.pdf to set up our model.

Maybe you cold point us to other things we should check?

It’s not that we want things to be identical to Opencv, we just want the detection to work. I guess another thing that I want to know is if we can expect completely identical results in different yolo implementations, provided that we use same configs and weights. Or some deviations are possible?

Thanks!

Hi,

Maybe you can also check these parameters:

static bool NvDsInferParseYoloV3()
{
    ## Bounding box overlap Threshold
    const float kNMS_THRESH = 0.5f;
    const float kPROB_THRESH = 0.7f;
    ## Predicted boxes
    const uint kNUM_BBOXES = 3;
}

Thanks.

Hi, we also test YoloV3 pre-train on Coco dataset, and get different result.
Deepstream - use sample deep stream-app with defaults parameters
Code for dnn version I attach in the end of post.

Up picture result from dnn, down picture result from deep stream. Person in car in deep stream version doesn’t recognize.

This is link for google drive with python script, deep stream cfg, YoloModel
https://drive.google.com/drive/folders/1T5VyBY0j6lX0wlwdItTRlHN9vi505IR9?usp=sharing

What need to do? to take the same result with deep stream?

As aasta mentioned have you made sure the thresholds used for opencv case and deepstream are the same ?

Also from the screenshot you’ve posted the deepstream sample seems to be pixelated. Are you using the same video to test in both cases ? Can you also post deepstream config files you are using ?

Hi,

  1. Code in opencv with NMS:
idxs = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.3)

Code in deep stream

const uint kNUM_BBOXES = 3;
    static const float kNMS_THRESH = 0.3f;
    static const float kPROB_THRESH = 0.5f;
  1. I use the same video for both case.

  2. Deepstream cfg

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=1
rows=4
columns=2
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=1
type=3
uri=file://sample_720p.mp4
num-sources=8
gpu-id=0
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=5
sync=1
source-id=0
gpu-id=0
qos=0
nvbuf-memory-type=0
overlay-id=1

[sink1]
enable=1
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
sync=0
#iframeinterval=10
bitrate=2000000
output-file=out_tiled.mp4
source-id=0

[osd]
enable=1
gpu-id=0
border-width=1
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=8
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
batch-size=8
interval=4
#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer.txt

[tracker]
enable=1
tracker-width=480
tracker-height=272
#ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_iou.so
ll-lib-file=/opt/nvidia/deepstream/deepstream-4.0/lib/libnvds_mot_klt.so
#ll-config-file required for IOU only
#ll-config-file=iou_config.txt
gpu-id=0

[tests]
file-loop=0
[property]

#0=RGB, 1=BGR
model-color-format=1

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0

custom-network-config=/home/jetson2/test_yolov3/yolov3.cfg
model-file=/home/jetson2/test_yolov3/yolov3.weights
model-engine-file=/home/jetson2/test_yolov3/model_b8_fp32.engine
labelfile-path=/home/jetson2/test_yolov3/yolov3.names

gpu-id=0
net-scale-factor=1

num-detected-classes=80
gie-unique-id=1
is-classifier=0
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV3
custom-lib-path=/home/jetson2/polifem/sdk/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so

Frame 28, opencv find:

  1. Bus with good conf about 0.99
  2. 2 person
  3. 6 car

Deepstream find:

  1. 2 person
  2. 5 car

By any chance do you have these variables enabled in the config file - ?

detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

I know you’ve posted the config, but better to double check. In any case, it’s hard to guarantee a 1:1 result between both the frameworks since the underlying image processing libraries are different. You can also notice in your comment here - https://devtalk.nvidia.com/default/topic/1066154/deepstream-sdk/different-result-with-custom-tiny_yolo-model-in-deepstream-vs-cv2-dnn-module/post/5403786/#5403786 the backpack is detected in the DS case and not in the other one.

Do you notice this behavior regularly where DS is performing poorly ?