Loading OCDNet as sgie0

I am trying to load the ocdnet model as sgie in deepstream. I could load the model but deepstream is failing to parse the output. What is name of the output-blob-names for the pre-trained model from ngc? I set it to pred but not working. Also, for parsing will nvocdr libnvocdr_impl.so work only for ocdnet?

Actually I was able to load the ocdnet as secondary engine. But getting no output from it. Neither any error. Probably for the parser.

Config for sgie0

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
#custom-network-config=yolo-obj-box-detection.cfg
#model-file=yolo-obj_best_box.weights
#onnx-file=yolov4_-1_3_608_608_dynamic.onnx
onnx-file=/home/sigmind/deepstream_sdk_v6.3.0_x86_64/opt/nvidia/deepstream/deepstream-6.3/samples/models/Secondary_VehicleTypes/ocdnet.onnx
#model-engine-file=model_b4_gpu0_fp32.engine
model-engine-file=/home/sigmind/deepstream_sdk_v6.3.0_x86_64/opt/nvidia/deepstream/deepstream-6.3/samples/models/Secondary_VehicleTypes/ocdnet.fp16.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=2
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=0
symmetric-padding=1
force-implicit-batch-dim=0
#workspace-size=2000
parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=/media/sigmind/URSTP_HDD1416/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
tensor-meta=1

[class-attrs-all]
pre-cluster-threshold=0.2
topk=300

Different models require different post-processing source code, the configuration files you provided will not work, they are only applicable to the output parsing of some Yolo models

For OCDNet, this is the sample code.

But for OCDNet, you usually have to work with OCRNet,In addition, OCDNet is usually used as PGIE

Can you share your goal? I don’t understand your intention

I know the the yolo parsing won’t work. But could successfully load the model. Now working on the parsing. My goal is to detect boxes on a conveyor belt then detect texts on the boxes. Then ocr the text. I am thinking box detection model yolo as pgie. OCDnet as sgie0 and OCRnet as sgie1.

Actually I was able to generate the parser. Would be helpful if you could suggest any modification for proper parsing.

#include "nvdsinfer_custom_impl.h"
#include <opencv2/opencv.hpp>

extern "C" bool NvDsInferParseYolo(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList);

static float contourScore(const cv::Mat& binary, const std::vector<cv::Point>& contour) {
    cv::Rect rect = cv::boundingRect(contour);
    int xmin = std::max(rect.x, 0);
    int xmax = std::min(rect.x + rect.width, binary.cols - 1);
    int ymin = std::max(rect.y, 0);
    int ymax = std::min(rect.y + rect.height, binary.rows - 1);

    cv::Mat binROI = binary(cv::Rect(xmin, ymin, xmax - xmin + 1, ymax - ymin + 1));
    cv::Mat mask = cv::Mat::zeros(ymax - ymin + 1, xmax - xmin + 1, CV_8U);
    
    std::vector<cv::Point> roiContour;
    for (const auto& pt : contour) {
        roiContour.emplace_back(cv::Point(pt.x - xmin, pt.y - ymin));
    }
    std::vector<std::vector<cv::Point>> roiContours = {roiContour};
    cv::fillPoly(mask, roiContours, cv::Scalar(1));
    
    return cv::mean(binROI, mask)[0];
}

static NvDsInferParseObjectInfo convertBBox(const cv::RotatedRect& box, const uint& netW, const uint& netH) {
    NvDsInferParseObjectInfo b;
    cv::Rect bbox = box.boundingRect();
    
    // Clamp values to network dimensions
    bbox.x = std::max(0, std::min(bbox.x, (int)netW));
    bbox.y = std::max(0, std::min(bbox.y, (int)netH));
    bbox.width = std::min(bbox.width, (int)netW - bbox.x);
    bbox.height = std::min(bbox.height, (int)netH - bbox.y);
    
    b.left = bbox.x;
    b.top = bbox.y;
    b.width = bbox.width;
    b.height = bbox.height;
    
    return b;
}

static std::vector<NvDsInferParseObjectInfo> decodeTensorYolo(
    const float* output,
    const uint& outputH, const uint& outputW,
    const uint& netW, const uint& netH,
    const std::vector<float>& preclusterThreshold)
{
    std::vector<NvDsInferParseObjectInfo> binfo;
    
    // Convert network output to OpenCV Mat
    cv::Mat predMap(outputH, outputW, CV_32F, (void*)output);
    
    // Threshold the prediction map
    cv::Mat binary;
    cv::threshold(predMap, binary, preclusterThreshold[0], 1.0, cv::THRESH_BINARY);
    binary.convertTo(binary, CV_8U);
    
    // Find contours
    std::vector<std::vector<cv::Point>> contours;
    cv::findContours(binary, contours, cv::RETR_LIST, cv::CHAIN_APPROX_SIMPLE);
    
    // Process each contour
    const float polygonThreshold = 0.3; // Same as default in OCDNetEngine
    const int maxContours = 200; // Same as default in OCDNetEngine
    
    size_t numCandidate = std::min(contours.size(), (size_t)maxContours);
    for (size_t i = 0; i < numCandidate; i++) {
        float score = contourScore(predMap, contours[i]);
        if (score < polygonThreshold) {
            continue;
        }
        
        // Get rotated rectangle
        cv::RotatedRect box = cv::minAreaRect(contours[i]);
        
        // Filter small boxes
        float shortSide = std::min(box.size.width, box.size.height);
        if (shortSide < 1) {
            continue;
        }
        
        // Convert to NvDsInferParseObjectInfo
        NvDsInferParseObjectInfo bbi = convertBBox(box, netW, netH);
        
        // Skip invalid detections
        if (bbi.width < 1 || bbi.height < 1) {
            continue;
        }
        
        bbi.detectionConfidence = score;
        bbi.classId = 0;  // Single class for text detection
        binfo.push_back(bbi);
    }
    
    return binfo;
}

static bool NvDsInferParseCustomYolo(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    if (outputLayersInfo.empty()) {
        std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
        return false;
    }
    
    const NvDsInferLayerInfo& output = outputLayersInfo[0];
    
    // Get output dimensions
    const uint outputH = output.inferDims.d[1];  // Height
    const uint outputW = output.inferDims.d[2];  // Width
    
    std::vector<NvDsInferParseObjectInfo> objects = decodeTensorYolo(
        (const float*)(output.buffer),
        outputH, outputW,
        networkInfo.width, networkInfo.height,
        detectionParams.perClassPreclusterThreshold);
    
    objectList = objects;
    return true;
}

extern "C" bool NvDsInferParseYolo(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    return NvDsInferParseCustomYolo(
        outputLayersInfo,
        networkInfo,
        detectionParams,
        objectList);
}

CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYolo);

current output:


I understand your intention, but the pipeline I suggest is this

Yolo --> videotemplate
        |
   add probe function at pgie src, then crop the boxes to a image.

Refer to the implementation of deepstream-nvocdr-app

When cropping boxes, remove all padding

I implemented the pipeline as you suggested. Also, implemented the pipeline as I planned. For both implementation OCDnet is unable to capture vertical texts.

The sample code can recognize vertical text. Can you share the test stream so that we can test it?

This problem should only be caused by accuracy. I used deepstream_nvocdr_app for testing without adding YOLO as pgie, and it worked normally. Vertical text is also recognized correctly.

Actually I made a mistake. I am getting the result now. Thank you for the help.

1 Like