@AastaLLL
Yes, we are using 6.0.1… I saw that posting and verified that we had that patch several days ago.
Also, our issue isn’t speed, it’s accuracy/detection we lost 1 to 2 orders of magnitude of detections when we went from DS5 to DS6. In order to get any detections, we had to turn our pre-cluster-threshold down to 0.001 which still didn’t yield usable results and didn’t offer any control. Obviously, something was very wrong. We found that many bounding boxes were returning largely negative probabilities (-300, -500, even -800%).
But I think that we have found the issue and I want to share it because it’s a problem with the deepstream code base. (/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/)
In DS 5, we were using a Yolo parser provided by an outside resource (and I don’t currently have the source for it). That parser wasn’t available for DS 6, so we used the libnvdsinfer_custom_impl_Yolo library provided with deepstream 6.0.1.
Digging deep into the detection issues, what we found was that the provided code did not properly process the YOLOV2 (Tiny) output. YOLO V2 output requires special parsing to normalize the bounding boxes and probability into something usable by the rest of the pipeline.
Digging into the parsing code below (from 6.0.1), the first thing that I noticed was that there were no post-processing/scaling on any of the output fields except for an exp on the W/H. YOLO needs a sigmoid on the X/Y, Objectiveness, and a softmax on the Class probabilities.
static std::vector<NvDsInferParseObjectInfo>
decodeYoloV2Tensor(
const float* detections, const std::vector<float> &anchors,
const uint gridSizeW, const uint gridSizeH, const uint stride, const uint numBBoxes,
const uint numOutputClasses, const uint& netW,
const uint& netH)
{
std::vector<NvDsInferParseObjectInfo> binfo;
for (uint y = 0; y < gridSizeH; ++y) {
for (uint x = 0; x < gridSizeW; ++x) {
for (uint b = 0; b < numBBoxes; ++b)
{
const float pw = anchors[b * 2];
const float ph = anchors[b * 2 + 1];
const int numGridCells = gridSizeH * gridSizeW;
const int bbindex = y * gridSizeW + x;
const float bx
= x + detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 0)];
const float by
= y + detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 1)];
const float bw
= pw * exp (detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)]);
const float bh
= ph * exp (detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)]);
const float objectness
= detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 4)];
float maxProb = 0.0f;
int maxIndex = -1;
for (uint i = 0; i < numOutputClasses; ++i)
{
float prob
= (detections[bbindex
+ numGridCells * (b * (5 + numOutputClasses) + (5 + i))]);
if (prob > maxProb)
{
maxProb = prob;
maxIndex = i;
}
}
maxProb = objectness * maxProb;
addBBoxProposal(bx, by, bw, bh, stride, netW, netH, maxIndex, maxProb, binfo);
}
}
}
return binfo;
}
I modified the code as you can see below. (I will update if I find any further issues, but this seems to give very “real” results compared to the supplied function)
static float safesigmoid(float x)
{
if (x > 0.0f)
{
return (float)(1.0f / (1.0f + exp(-x)));
}
else
{
auto e = exp(x);
return (float)(e / (1.0f + e));
}
}
static std::vector<NvDsInferParseObjectInfo>
decodeYoloV2Tensor(
const float* detections, const std::vector<float> &anchors,
const uint gridSizeW, const uint gridSizeH, const uint stride, const uint numBBoxes,
const uint numOutputClasses, const uint& netW,
const uint& netH)
{
std::vector<NvDsInferParseObjectInfo> binfo;
for (uint y = 0; y < gridSizeH; ++y) {
for (uint x = 0; x < gridSizeW; ++x) {
for (uint b = 0; b < numBBoxes; ++b)
{
const float pw = anchors[b * 2];
const float ph = anchors[b * 2 + 1];
const int numGridCells = gridSizeH * gridSizeW;
const int bbindex = y * gridSizeW + x;
const float bx = x + safesigmoid (detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 0)]);
const float by = y + safesigmoid (detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 1)]);
const float bw = pw * exp (detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 2)]);
const float bh = ph * exp (detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)]);
const float objectness = safesigmoid(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 4)]);
float maxProb = 0.0f;
int maxIndex = -1;
float sum = 0.0f;
for (uint i = 0; i < numOutputClasses; ++i)
{
float prob = exp(detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + (5 + i))]);
sum += prob;
if (prob > maxProb)
{
maxProb = prob;
maxIndex = i;
}
}
if (sum>0)
{
maxProb = objectness * maxProb / sum;
}
addBBoxProposal(bx, by, bw, bh, stride, netW, netH, maxIndex, maxProb, binfo);
}
}
}
return binfo;
}
I wanted to update you on this as soon as possible and get this searchable for others dealing with poor detections on YOLO V2 Tiny, especially with those trained on Microsoft Azure Custom Vision ONNX Models (Compact Domain). I assume that there may be equivalent issues with the other YOLO versions as well, but I don’t have time or test setup to verify and/or fix.
Please let me know what you think, if you have any questions, and if there is anything additional that I can supply?