I was following the sample code https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/tree/master/yolo/samples/objectDetector_YoloV3
I modified the respective variable and was able to buid but the detection boxes are all not correct. Is there anywhere I can find yolov2 sample app for DS?
CJR
June 28, 2019, 5:23pm
2
Hi,
can you provide more information regarding the changes you have made and whats incorrect in the output detections ?
So one of the obvious things that needs to be done is to create a yolov2 engine. I created that. And rest of the computations are pretty much the same as objectDetector_YoloV3 . I configured the paths accordingly and I can see that YoloV2 engine is getting used. The Detection boxes are very small. Are we to scale the Bounding Box values for yolo v2? Any help in this regard is much appreciated.
CJR
July 1, 2019, 5:41pm
4
Yes that is exactly what I’ve done
in https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/3a8957b2d985d7fc2498a0f070832eb145e809ca/yolo/samples/objectDetector_YoloV3/nvdsinfer_custom_impl_YoloV3/nvdsparsebbox_YoloV3.cpp#L160
const float bx
= x + detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 0)];
const float by
= y + detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 1)];
const float bw
= pw * exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)]);
const float bh
= ph * exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)]);
in https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/3a8957b2d985d7fc2498a0f070832eb145e809ca/yolo/samples/objectDetector_YoloV3/nvdsinfer_custom_impl_YoloV3/nvdsparsebbox_YoloV3.cpp#L210
static int outputBlobIndex1 = -1;
static const int NUM_CLASSES_YOLO_V2 = 80;
static bool classMismatchWarn = false;
if (outputBlobIndex1 == -1)
{
for (uint i = 0; i < outputLayersInfo.size(); i++)
{
if (strcmp(outputLayersInfo[i].layerName, "region_32") == 0)
{
outputBlobIndex1 = i;
break;
}
}
if (outputBlobIndex1 == -1)
{
std::cerr << "Could not find output layer 'region_32' while parsing" << std::endl;
return false;
}
}
and in https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/3a8957b2d985d7fc2498a0f070832eb145e809ca/yolo/samples/objectDetector_YoloV3/nvdsinfer_custom_impl_YoloV3/nvdsparsebbox_YoloV3.cpp#L275
std::vector<float*> outputBlobs(1, nullptr);
outputBlobs.at(0) = (float*) outputLayersInfo[outputBlobIndex1].buffer;
const float kNMS_THRESH = 0.4f;
const float kPROB_THRESH = 0.5f;
const uint kNUM_BBOXES = 5;
const uint kINPUT_H = 608;
const uint kINPUT_W = 608;
const uint kSTRIDE_1 = 32;
const uint kGRID_SIZE_1 = kINPUT_H / kSTRIDE_1;
std::vector<NvDsInferParseObjectInfo> objects;
std::vector<NvDsInferParseObjectInfo> objects1
= decodeTensor(outputBlobs.at(0), kMASK_1, kANCHORS, kGRID_SIZE_1, kSTRIDE_1, kNUM_BBOXES,
NUM_CLASSES_YOLO_V2, kPROB_THRESH, kINPUT_W, kINPUT_H);
objectList.clear();
objectList = nmsAllClasses(kNMS_THRESH, objects1, NUM_CLASSES_YOLO_V2);
But my results are all goofed up.
CJR
July 2, 2019, 5:48pm
6
What about anchors ? Have you updated them ? Anchors vary from yolov2 and yolov3 and they also need to be in network input resolution.
Have a look at how its done here - deepstream_reference_apps/yolo.cpp at 3a8957b2d985d7fc2498a0f070832eb145e809ca · NVIDIA-AI-IOT/deepstream_reference_apps · GitHub
hey hi thanks for pointing out that to me. I modified the code accordingly and multiplied stride to anchor but the results are the same
FYI my anchors for YoloV2 is defined thus
const std::vector<float> kANCHORS
= {0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828};
CJR
July 3, 2019, 7:01pm
8
const std::vector<float> kANCHORS
= {0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828};
These need to be multiplied by stride.
How are you computing the values of ph and pw ? There are no masks in yolov2 and still seem to be using them. All the information you need to implement the decoding of bounding boxes from networks output is present over here. Please check if your implementation is exactly the same.
https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/yolo/lib/yolov2.cpp#L33
Yes I did multiply with the stride when you suggested that in your answer Posted 07/02/2019 05:48 PM.
My ph and pw follows in line with your suggestion
const float pw = anchors[2 * b];
const float ph = anchors[2 * b + 1];
.
.
.
const float bw = pw
* exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)]);
const float bh = ph
* exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)]);
In fact from the beginning I’ve reoffered https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/blob/master/yolo/lib/yolov2.cpp#L33 . The thing is bounding boxes are all messy.
I’m not a big expert in yolo but do you suspect the way final layer o/p is stacked in yolov2 is any different from yolov3 as problem?
CJR
July 8, 2019, 9:52pm
10
The output layer implementations of yolov2 and yolov3 are different and the changes in decodeTensor function should take of that.
Can you share a sample image of how the outputs look like ? I would double check if the cfg file used to generate the engine has the same parameters as your NvDsInferParseCustomYoloV3(…) function. Typically, if the network input height and width are different in engine file vs parsing function, you would see such behavior.