Loading OCDNet as sgie0

shehjadishan211 · January 1, 2025, 11:46am

I am trying to load the ocdnet model as sgie in deepstream. I could load the model but deepstream is failing to parse the output. What is name of the output-blob-names for the pre-trained model from ngc? I set it to pred but not working. Also, for parsing will nvocdr libnvocdr_impl.so work only for ocdnet?

shehjadishan211 · January 1, 2025, 12:18pm

Actually I was able to load the ocdnet as secondary engine. But getting no output from it. Neither any error. Probably for the parser.

Config for sgie0

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
#custom-network-config=yolo-obj-box-detection.cfg
#model-file=yolo-obj_best_box.weights
#onnx-file=yolov4_-1_3_608_608_dynamic.onnx
onnx-file=/home/sigmind/deepstream_sdk_v6.3.0_x86_64/opt/nvidia/deepstream/deepstream-6.3/samples/models/Secondary_VehicleTypes/ocdnet.onnx
#model-engine-file=model_b4_gpu0_fp32.engine
model-engine-file=/home/sigmind/deepstream_sdk_v6.3.0_x86_64/opt/nvidia/deepstream/deepstream-6.3/samples/models/Secondary_VehicleTypes/ocdnet.fp16.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=1
network-mode=2
num-detected-classes=1
interval=0
gie-unique-id=1
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=0
symmetric-padding=1
force-implicit-batch-dim=0
#workspace-size=2000
parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=/media/sigmind/URSTP_HDD1416/DeepStream-Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
tensor-meta=1

[class-attrs-all]
pre-cluster-threshold=0.2
topk=300

junshengy · January 2, 2025, 6:37am

Different models require different post-processing source code, the configuration files you provided will not work, they are only applicable to the output parsing of some Yolo models

For OCDNet, this is the sample code.

github.com

NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution/blob/ef34a6ab8ac82f53c972a7018bf19c4e4d4a1260/src/OCDNetEngine.cpp#L180


      
              blobFromImagesCUDA(input_data, output_data, input_shape, output_shape, mIsNHWC, stream);
              return true;
          }
          
          
          bool
          OCDNetEngine::postprocess(BufferManager& buffer_mgr, const Dims& input_shape, 
                                    std::vector<std::vector<Polygon>>& output,
                                    const cudaStream_t& stream)
          {
          
              float* inferOutputDataHost =  static_cast<float*>(buffer_mgr.mHostBuffer[mInferOutputbufHostIdx].data());
              uchar* thresholdCUDAHost = static_cast<uchar*>(buffer_mgr.mHostBuffer[mOutputThresholdHostIdx].data());
          
              thresholdCUDA(buffer_mgr.mDeviceBuffer[mTRTOutputBufferIndex].data(), buffer_mgr.mDeviceBuffer[mOutputThresholdDevIdx].data(), input_shape, mBinaryThreshold, stream);
              // copy thresholdCUDA results from device to host
              checkCudaErrors(cudaMemcpyAsync(buffer_mgr.mHostBuffer[mOutputThresholdHostIdx].data(),  buffer_mgr.mDeviceBuffer[mOutputThresholdDevIdx].data(), volume(input_shape)*sizeof(uchar), cudaMemcpyDeviceToHost, stream));
              checkCudaErrors(cudaMemcpyAsync(buffer_mgr.mHostBuffer[mInferOutputbufHostIdx].data(), buffer_mgr.mDeviceBuffer[mTRTOutputBufferIndex].data(), volume(input_shape) * sizeof(float), cudaMemcpyDeviceToHost, stream));
              cudaStreamSynchronize(stream);
          
              for (size_t n = 0; n < input_shape.d[0]; n++)

But for OCDNet, you usually have to work with OCRNet，In addition, OCDNet is usually used as PGIE

Can you share your goal? I don’t understand your intention

shehjadishan211 · January 2, 2025, 7:21am

I know the the yolo parsing won’t work. But could successfully load the model. Now working on the parsing. My goal is to detect boxes on a conveyor belt then detect texts on the boxes. Then ocr the text. I am thinking box detection model yolo as pgie. OCDnet as sgie0 and OCRnet as sgie1.

shehjadishan211 · January 2, 2025, 8:28am

Actually I was able to generate the parser. Would be helpful if you could suggest any modification for proper parsing.

#include "nvdsinfer_custom_impl.h"
#include <opencv2/opencv.hpp>

extern "C" bool NvDsInferParseYolo(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList);

static float contourScore(const cv::Mat& binary, const std::vector<cv::Point>& contour) {
    cv::Rect rect = cv::boundingRect(contour);
    int xmin = std::max(rect.x, 0);
    int xmax = std::min(rect.x + rect.width, binary.cols - 1);
    int ymin = std::max(rect.y, 0);
    int ymax = std::min(rect.y + rect.height, binary.rows - 1);

    cv::Mat binROI = binary(cv::Rect(xmin, ymin, xmax - xmin + 1, ymax - ymin + 1));
    cv::Mat mask = cv::Mat::zeros(ymax - ymin + 1, xmax - xmin + 1, CV_8U);
    
    std::vector<cv::Point> roiContour;
    for (const auto& pt : contour) {
        roiContour.emplace_back(cv::Point(pt.x - xmin, pt.y - ymin));
    }
    std::vector<std::vector<cv::Point>> roiContours = {roiContour};
    cv::fillPoly(mask, roiContours, cv::Scalar(1));
    
    return cv::mean(binROI, mask)[0];
}

static NvDsInferParseObjectInfo convertBBox(const cv::RotatedRect& box, const uint& netW, const uint& netH) {
    NvDsInferParseObjectInfo b;
    cv::Rect bbox = box.boundingRect();
    
    // Clamp values to network dimensions
    bbox.x = std::max(0, std::min(bbox.x, (int)netW));
    bbox.y = std::max(0, std::min(bbox.y, (int)netH));
    bbox.width = std::min(bbox.width, (int)netW - bbox.x);
    bbox.height = std::min(bbox.height, (int)netH - bbox.y);
    
    b.left = bbox.x;
    b.top = bbox.y;
    b.width = bbox.width;
    b.height = bbox.height;
    
    return b;
}

static std::vector<NvDsInferParseObjectInfo> decodeTensorYolo(
    const float* output,
    const uint& outputH, const uint& outputW,
    const uint& netW, const uint& netH,
    const std::vector<float>& preclusterThreshold)
{
    std::vector<NvDsInferParseObjectInfo> binfo;
    
    // Convert network output to OpenCV Mat
    cv::Mat predMap(outputH, outputW, CV_32F, (void*)output);
    
    // Threshold the prediction map
    cv::Mat binary;
    cv::threshold(predMap, binary, preclusterThreshold[0], 1.0, cv::THRESH_BINARY);
    binary.convertTo(binary, CV_8U);
    
    // Find contours
    std::vector<std::vector<cv::Point>> contours;
    cv::findContours(binary, contours, cv::RETR_LIST, cv::CHAIN_APPROX_SIMPLE);
    
    // Process each contour
    const float polygonThreshold = 0.3; // Same as default in OCDNetEngine
    const int maxContours = 200; // Same as default in OCDNetEngine
    
    size_t numCandidate = std::min(contours.size(), (size_t)maxContours);
    for (size_t i = 0; i < numCandidate; i++) {
        float score = contourScore(predMap, contours[i]);
        if (score < polygonThreshold) {
            continue;
        }
        
        // Get rotated rectangle
        cv::RotatedRect box = cv::minAreaRect(contours[i]);
        
        // Filter small boxes
        float shortSide = std::min(box.size.width, box.size.height);
        if (shortSide < 1) {
            continue;
        }
        
        // Convert to NvDsInferParseObjectInfo
        NvDsInferParseObjectInfo bbi = convertBBox(box, netW, netH);
        
        // Skip invalid detections
        if (bbi.width < 1 || bbi.height < 1) {
            continue;
        }
        
        bbi.detectionConfidence = score;
        bbi.classId = 0;  // Single class for text detection
        binfo.push_back(bbi);
    }
    
    return binfo;
}

static bool NvDsInferParseCustomYolo(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    if (outputLayersInfo.empty()) {
        std::cerr << "ERROR: Could not find output layer in bbox parsing" << std::endl;
        return false;
    }
    
    const NvDsInferLayerInfo& output = outputLayersInfo[0];
    
    // Get output dimensions
    const uint outputH = output.inferDims.d[1];  // Height
    const uint outputW = output.inferDims.d[2];  // Width
    
    std::vector<NvDsInferParseObjectInfo> objects = decodeTensorYolo(
        (const float*)(output.buffer),
        outputH, outputW,
        networkInfo.width, networkInfo.height,
        detectionParams.perClassPreclusterThreshold);
    
    objectList = objects;
    return true;
}

extern "C" bool NvDsInferParseYolo(
    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
    NvDsInferNetworkInfo const& networkInfo,
    NvDsInferParseDetectionParams const& detectionParams,
    std::vector<NvDsInferParseObjectInfo>& objectList)
{
    return NvDsInferParseCustomYolo(
        outputLayersInfo,
        networkInfo,
        detectionParams,
        objectList);
}

CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseYolo);

current output:

junshengy · January 2, 2025, 9:20am

I understand your intention, but the pipeline I suggest is this

Yolo --> videotemplate
        |
   add probe function at pgie src, then crop the boxes to a image.

Refer to the implementation of deepstream-nvocdr-app

github.com

NVIDIA-AI-IOT/deepstream_tao_apps/blob/master/apps/tao_others/deepstream-nvocdr-app/nvocdr_app_config.yml#L21


      
            enc-type: 0
            bitrate: 2000000
            ##The file name without suffix
            filename: test
          
          streammux:
            width: 1280
            height: 720
            batched-push-timeout: 40000
          
          video-template:
            #x86
            customlib-name: nvocdr_libs/x86/libnvocdr_impl.so
            #aarch64
            #customlib-name: nvocdr_libs/aarch64/libnvocdr_impl.so
            customlib-props:
              - ocdnet-engine-path:../../../models/nvocdr/ocdnet.fp16.engine
              - ocdnet-input-shape:3,736,1280
              - ocdnet-binarize-threshold:0.1
              - ocdnet-polygon-threshold:0.3
              - ocdnet-max-candidate:200

When cropping boxes, remove all padding

github.com

NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution/blob/main/deepstream/nvocdrlib_impl.cpp#L752


      
              return;
          }
          
          void *imagedata_ptr = NULL;
          if (m_process_surf->memType == NVBUF_MEM_SURFACE_ARRAY) {
              imagedata_ptr = (uint8_t *)m_egl_frame.frame.pPitch[0];
          } else {
              imagedata_ptr = (uint8_t *)m_process_surf->surfaceList[0].dataPtr;
          }
          
          // 2D copy the data to remove padding
          if (m_interbuffer != nullptr)
          {
            size_t dst_offset = 0;
            size_t dst_pitch = m_batch_width * 3;
            size_t dst_size = m_batch_height * dst_pitch;
            size_t src_offset = 0;
            size_t src_pitch = m_process_surf->surfaceList[0].pitch;
            size_t src_size = m_batch_height * src_pitch;
            for(int i = 0; i < num_in_meta; i++)
            {

shehjadishan211 · January 8, 2025, 9:22am

I implemented the pipeline as you suggested. Also, implemented the pipeline as I planned. For both implementation OCDnet is unable to capture vertical texts.

junshengy · January 8, 2025, 9:39am

The sample code can recognize vertical text. Can you share the test stream so that we can test it?

shehjadishan211 · January 8, 2025, 9:51am

junshengy · January 8, 2025, 11:35am

This problem should only be caused by accuracy. I used deepstream_nvocdr_app for testing without adding YOLO as pgie, and it worked normally. Vertical text is also recognized correctly.

shehjadishan211 · January 8, 2025, 11:49am

Actually I made a mistake. I am getting the result now. Thank you for the help.

Topic		Replies	Views
OCRNET parse function for DeepStream DeepStream SDK	30	784	December 30, 2024
Training OCRNet for being used for LPD/LPR DeepStream SDK	66	1611	June 4, 2024
Error in nvOCDR DeepStream DeepStream SDK jetson , deepstream	25	314	March 24, 2025
Deploy only OCRnet to deepstream DeepStream SDK	4	386	May 7, 2024
NVIDIA-AI-IOT / deepstream_lpr_app is not working when only using LPD and LPR model DeepStream SDK	23	968	June 8, 2023
Integrate YOLOv5 as pgie and OCRNet as sgie TAO Toolkit	4	88	November 13, 2024
OCRNet output parser not working DeepStream SDK	5	291	June 4, 2024
OCR Model as Pgie DeepStream SDK deepstream	8	231	December 24, 2024
Could not find output coverage layer for parsing objects DeepStream SDK	31	1532	May 16, 2024
How to use onnx file with deepstream-test1-usbcam + Custom models DeepStream SDK	30	4975	October 12, 2021

Loading OCDNet as sgie0

Related topics