Yolo2 App for DS

ahl74995 · June 28, 2019, 7:24am

I was following the sample code https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/yolo/samples/objectDetector_YoloV3

I modified the respective variable and was able to buid but the detection boxes are all not correct. Is there anywhere I can find yolov2 sample app for DS?

CJR · June 28, 2019, 5:23pm

Hi,

can you provide more information regarding the changes you have made and whats incorrect in the output detections ?

ahl74995 · July 1, 2019, 2:45am

So one of the obvious things that needs to be done is to create a yolov2 engine. I created that. And rest of the computations are pretty much the same as objectDetector_YoloV3. I configured the paths accordingly and I can see that YoloV2 engine is getting used. The Detection boxes are very small. Are we to scale the Bounding Box values for yolo v2? Any help in this regard is much appreciated.

CJR · July 1, 2019, 5:41pm

Well i would double check if all the parameters here match the engine you have generated -

https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/yolo/samples/objectDetector_YoloV3/nvdsinfer_custom_impl_YoloV3/nvdsparsebbox_YoloV3.cpp#L280

Along with that, keep in mind that the decodeTensor operations vary between yolov2 and yolov3.

See differences here -
https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/yolo/lib/yolov2.cpp#L59
https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/yolo/lib/yolov3.cpp#L59

ahl74995 · July 2, 2019, 6:10am

Yes that is exactly what I’ve done

in https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/3a8957b2d985d7fc2498a0f070832eb145e809ca/yolo/samples/objectDetector_YoloV3/nvdsinfer_custom_impl_YoloV3/nvdsparsebbox_YoloV3.cpp#L160

const float bx
                   = x + detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 0)];
               const float by
                   = y + detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 1)];
               const float bw
                   = pw * exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)]);
               const float bh
                   = ph * exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)]);

in https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/3a8957b2d985d7fc2498a0f070832eb145e809ca/yolo/samples/objectDetector_YoloV3/nvdsinfer_custom_impl_YoloV3/nvdsparsebbox_YoloV3.cpp#L210

static int outputBlobIndex1 = -1;
   static const int NUM_CLASSES_YOLO_V2 = 80;
   static bool classMismatchWarn = false;

if (outputBlobIndex1 == -1)
   {
       for (uint i = 0; i < outputLayersInfo.size(); i++)
       {
           if (strcmp(outputLayersInfo[i].layerName, "region_32") == 0)
           {
               outputBlobIndex1 = i;
               break;
           }
       }
       if (outputBlobIndex1 == -1)
       {
           std::cerr << "Could not find output layer 'region_32' while parsing" << std::endl;
           return false;
       }
   }

and in https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/3a8957b2d985d7fc2498a0f070832eb145e809ca/yolo/samples/objectDetector_YoloV3/nvdsinfer_custom_impl_YoloV3/nvdsparsebbox_YoloV3.cpp#L275

std::vector<float*> outputBlobs(1, nullptr);
   outputBlobs.at(0) = (float*) outputLayersInfo[outputBlobIndex1].buffer;

   const float kNMS_THRESH = 0.4f;
   const float kPROB_THRESH = 0.5f;
   const uint kNUM_BBOXES = 5;
   const uint kINPUT_H = 608;
   const uint kINPUT_W = 608;
   const uint kSTRIDE_1 = 32;
   const uint kGRID_SIZE_1 = kINPUT_H / kSTRIDE_1;

std::vector<NvDsInferParseObjectInfo> objects;
   std::vector<NvDsInferParseObjectInfo> objects1
       = decodeTensor(outputBlobs.at(0), kMASK_1, kANCHORS, kGRID_SIZE_1, kSTRIDE_1, kNUM_BBOXES,
                      NUM_CLASSES_YOLO_V2, kPROB_THRESH, kINPUT_W, kINPUT_H);

   objectList.clear();

   objectList = nmsAllClasses(kNMS_THRESH, objects1, NUM_CLASSES_YOLO_V2);

But my results are all goofed up.

CJR · July 2, 2019, 5:48pm

What about anchors ? Have you updated them ? Anchors vary from yolov2 and yolov3 and they also need to be in network input resolution.

Have a look at how its done here - deepstream_reference_apps/yolo.cpp at 3a8957b2d985d7fc2498a0f070832eb145e809ca · NVIDIA-AI-IOT/deepstream_reference_apps · GitHub

ahl74995 · July 3, 2019, 8:52am

hey hi thanks for pointing out that to me. I modified the code accordingly and multiplied stride to anchor but the results are the same

FYI my anchors for YoloV2 is defined thus

const std::vector<float> kANCHORS
      = {0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828};

CJR · July 3, 2019, 7:01pm

const std::vector<float> kANCHORS
          = {0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828};

These need to be multiplied by stride.

How are you computing the values of ph and pw ? There are no masks in yolov2 and still seem to be using them. All the information you need to implement the decoding of bounding boxes from networks output is present over here. Please check if your implementation is exactly the same.

https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/yolo/lib/yolov2.cpp#L33

ahl74995 · July 4, 2019, 9:16am

Yes I did multiply with the stride when you suggested that in your answer Posted 07/02/2019 05:48 PM.

My ph and pw follows in line with your suggestion

const float pw = anchors[2 * b];
const float ph = anchors[2 * b + 1];

.
.
.
const float bw = pw
                    * exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)]);
const float bh = ph
                    * exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)]);

In fact from the beginning I’ve reoffered https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/yolo/lib/yolov2.cpp#L33. The thing is bounding boxes are all messy.

I’m not a big expert in yolo but do you suspect the way final layer o/p is stacked in yolov2 is any different from yolov3 as problem?

CJR · July 8, 2019, 9:52pm

The output layer implementations of yolov2 and yolov3 are different and the changes in decodeTensor function should take of that.

Can you share a sample image of how the outputs look like ? I would double check if the cfg file used to generate the engine has the same parameters as your NvDsInferParseCustomYoloV3(…) function. Typically, if the network input height and width are different in engine file vs parsing function, you would see such behavior.

Topic		Replies	Views
objectDetector_YoloV3 with Deepstream-app Random Bounding Boxes DeepStream SDK	3	1134	October 12, 2021
Tune deepstream yoloV3_tiny parameters to perform as darknet version without deepstream DeepStream SDK	10	930	October 12, 2021
Deepstream YoloV3-Tiny is giving oversized bounding boxes DeepStream SDK	8	1414	October 12, 2021
Deepstream deployment of yolov3 does not produce detection boxes DeepStream SDK	2	345	February 22, 2022
trt-yolo-app detection errors DeepStream SDK	10	1397	October 12, 2021
[TX2] yolo-v3 appear many red boxes if use my yolo-v3 model? DeepStream SDK	5	574	October 12, 2021
Deepstream YOLO doesn't work DeepStream SDK	7	487	October 12, 2021
DeepstreamSDK 4.0.1 Custom Yolov3-Tiny Error DeepStream SDK	10	1562	April 27, 2020
Deepstream app on python use yolo3 as primary infer DeepStream SDK	3	990	October 12, 2021
Deepstream trained yolo3-tiny accuracy DeepStream SDK	5	747	October 12, 2021

Yolo2 App for DS

Related topics