OCRNET parse function for DeepStream

• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
Deepstream 7.0

• JetPack Version (valid for Jetson only)
• TensorRT Version
8.6
• NVIDIA GPU Driver Version (valid for GPU only)
550.67
• Issue Type( questions, new requirements, bugs)
new requirements
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello,

I’m working on a project that involves using the OCRNet model to extract text from images. I’m using the DeepStream preprocess pipeline to isolate the text region before feeding it into OCRNet.

I’ve noticed that the parse function for OCRNet is often bundled with OCDNet in the provided examples. I’m wondering if there’s a separate parse function available for OCRNet that can be used independently.

I’m familiar with the LPRNet model, which has a dedicated parse function. I’m hoping for a similar functionality for OCRNet.

Could someone help me with this? Code examples would be greatly appreciated.
image

1 Like

Hi @Levi_Pereira, the C++ parsing function of OCRNet’s output is wrapped in the library nvOCDR.

For OCRNetV1 (CTC parsing): NVIDIA-Optical-Character-Detection-and-Recognition-Solution/src/OCRNetEngine.cpp at main · NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution · GitHub

For OCRNetV2.x (Attention parsing): NVIDIA-Optical-Character-Detection-and-Recognition-Solution/src/OCRNetEngine.cpp at main · NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution · GitHub

Basically, the input to these OCRNet parsing function is the raw output from OCRNet TensorRT engine.

For reference, we also have the seperate OCRNet inference python sample in tao-deploy: tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts/inference.py at main · NVIDIA/tao_deploy · GitHub

And you could find the python version of parsing function for OCRNet at: tao_deploy/nvidia_tao_deploy/cv/ocrnet/utils.py at main · NVIDIA/tao_deploy · GitHub

Hi @tylerz
Thank you for the links; they have been very helpful in understanding the implementation of CTC and the Attention Decoder. However, in the examples provided, the entire process uses Gst-nvdsvideotemplate, encapsulating the pre-processing, inference and post-processing within the library.

My objective is a bit different. I aim to use the pipeline with streamux > nvdspreprocess > nvinferserver (using Triton Server). In the Gst-nvinferserver component, I need to configure the post-process with custom_parse_classifier_func and a custom_lib.

Currently, I have successfully configured the model on Triton Server, and it is functioning. Now, I want to implement the OCRNet model in the DeepStream pipeline using nvinferserver. However, to complete this process, I need the custom_parse_classifier_func as part of the post-processing step.

Do you already have the custom_parse_classifier_func library for nvinferserver, or do we need to build it ourselves? If you have something ready, I would greatly appreciate it.

i need something like this.

bool NvDsInferParseOCRNet(std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
                                 NvDsInferNetworkInfo const &networkInfo, float classifierThreshold,
                                 std::vector<NvDsInferAttribute> &attrList, std::string &attrString)

This is very useful for reading information written on frames where the text area never changes, so we do not need OCDNet, only OCR with nvdspreprocess. The applications of the OCRNet model for this situation are extensive.

Hi @Levi_Pereira , sorry but we don’t have NvDsInferParseOCRNet. I think you can leverage the C++ implementation in the library ? Basically, outputLayersInfo is the raw output from TensorRT engine which is same as the output we get here in the library. And all you need then is to parse the raw output to the text.

I think @Fiona.Chen could give you more guidance to implement this function if you need.

Thanks,
Tyler

@Levi_Pereira

Do you mean you need to customize the postprocessing of the OCRNet output layers?

@Fiona.Chen I’m looking a custom custom_parse_classifier_func library for gst-nvinferserver, similar to the LPRNet parse function. My goal is to utilize the OCRNet model independently of the NVIDIA-Optical-Character-Detection-and-Recognition-Solution, which relies on nvdsvideotemplate

There is no separate custom_parse_classifier_func for OCRNet, but the output layer parsing and processing codes are here NVIDIA-Optical-Character-Detection-and-Recognition-Solution/src/OCRNetEngine.cpp at main · NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution (github.com), you can port it.

Hi,

I’ve successfully implemented the custom_parse_classifier_funcfor OCRNet and validated that it correctly sends data to the pipeline.
My pipeline is streamux > nvdspreprocess > pgie (triton-server) with a probe function on PGIE.
I use Gst-nvdspreprocess to send only the text area for OCR. Everything works perfectly up to the PGIE, but I can’t retrieve the data in the probe because frame_meta.obj_meta_list appears as NONE.
I seem to be missing something in the implementation.

snippet code

    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return

    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    

    while l_frame:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        frame_number=frame_meta.frame_num
        l_obj=frame_meta.obj_meta_list  ## THIS IS NONE

Triton Server Conf

 name: "nvidia-ocrnet"
 platform: "tensorrt_plan"
 max_batch_size: 32
 input [
   {
     name: "input"
     data_type: TYPE_FP32
     format: FORMAT_NCHW
     dims: [1 , 32, 100]
   }
 ]
 output [
   {
     name: "output_id"
     data_type: TYPE_INT32
     dims: [ 26 ]
   },
   {
     name: "output_prob"
     data_type: TYPE_FP32
     dims: [ 26 ]
   },
   {
     name: "798"
     data_type: TYPE_INT32
     dims: [ 26 ]
   }
 ]
 instance_group [
     {
       count: 1
       kind: KIND_GPU
       gpus: [ 0 ]
     }
 ]
 version_policy: { latest: { num_versions: 1}}
 dynamic_batching {
   max_queue_delay_microseconds: 0
 }

config_preprocess.txt

[property]
enable=1
target-unique-ids=1
process-on-frame=1

# if enabled maintain the aspect ratio while scaling
maintain-aspect-ratio=1

# if enabled pad symmetrically with maintain-aspect-ratio enabled
symmetric-padding=1

# processing width/height at which image scaled
processing-width=100
processing-height=32

scaling-buf-pool-size=6
tensor-buf-pool-size=6

# 0=NCHW, 1=NHWC, 2=CUSTOM
network-input-order=0

# tensor shape based on network-input-order
network-input-shape=32;1;32;100

# 0=RGB, 1=BGR, 2=GRAY
network-color-format=2

# 0=FP32, 1=UINT8, 2=INT8, 3=UINT32, 4=INT32, 5=FP16
tensor-data-type=0

tensor-name=input

# 0=NVBUF_MEM_DEFAULT 1=NVBUF_MEM_CUDA_PINNED 2=NVBUF_MEM_CUDA_DEVICE 3=NVBUF_MEM_CUDA_UNIFIED
scaling-pool-memory-type=0

# 0=NvBufSurfTransformCompute_Default 1=NvBufSurfTransformCompute_GPU 2=NvBufSurfTransformCompute_VIC
scaling-pool-compute-hw=0

# Scaling Interpolation method
# 0=NvBufSurfTransformInter_Nearest 1=NvBufSurfTransformInter_Bilinear 2=NvBufSurfTransformInter_Algo1
# 3=NvBufSurfTransformInter_Algo2 4=NvBufSurfTransformInter_Algo3 5=NvBufSurfTransformInter_Algo4
# 6=NvBufSurfTransformInter_Default
scaling-filter=0

custom-lib-path=/opt/nvidia/deepstream/deepstream/lib/gst-plugins/libcustom2d_preprocess.so
custom-tensor-preparation-function=CustomTensorPreparation

output-tensor-meta=1

[user-configs]
pixel-normalization-factor=0.00784313
#mean-file=
offsets=127.5

[group-0]
src-ids=0
custom-input-transformation-function=CustomAsyncTransformation
process-on-roi=1
roi-params-src-0=85;121;235;61
draw-roi=1
roi-color=1;1;1;1

pgie_conf.txt

infer_config {
  unique_id: 1
  gpu_ids: [0]
  max_batch_size: 32
  backend {

    triton {
      model_name: "nvidia-ocrnet"
      version: -1
      grpc {
        url: "127.0.0.1:8001"
        enable_cuda_buffer_sharing: true
      }
    }
  }

  input_tensor_from_meta { 
      is_first_dim_batch : true 
  }

  #preprocess {
  #  network_format: IMAGE_FORMAT_GRAY 
  #  tensor_order: TENSOR_ORDER_NONE
  #  normalize {
  #    scale_factor: 0.00784313
  #  }
  #}

  postprocess {
     classification {
      threshold:0.2
      custom_parse_classifier_func: "NvDsInferParseOCRNetCTC"
    }
  }
  extra {
    copy_input_to_host_buffers: false
    output_buffer_pool_size: 6
  }
  
  custom_lib {
    path: "/apps/custom_lib/nvocr/nvinfer_ocrnet_parser.so"
  }
}

input_control {
  process_mode : PROCESS_MODE_FULL_FRAME
  interval : 0
}

output_control {
  output_tensor_meta: true
}

custom_parse_classifier_func

#include <string>
#include <vector>
#include <iostream>
#include <locale>
#include <cstring>
#include "nvdsinfer_custom_impl.h"

using namespace std;
using std::string;
using std::vector;

static bool ocr_dict_ready = false;
std::vector<string> ocr_dict_table;

/* C-linkage to prevent name-mangling */
extern "C"
bool NvDsInferParseOCRNetCTC(std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
                                NvDsInferNetworkInfo const &networkInfo, float classifierThreshold,
                                std::vector<NvDsInferAttribute> &attrList, std::string &attrString);

extern "C" 
bool NvDsInferParseOCRNetCTC(std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
                                NvDsInferNetworkInfo const &networkInfo, float classifierThreshold,
                                std::vector<NvDsInferAttribute> &attrList, std::string &attrString)
{
    NvDsInferAttribute OCR_attr;

    if (!ocr_dict_ready) {
        static const char* hardcodedOCRDict[] = {
            "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
            "a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
            "k", "l", "m", "n", "o", "p", "q", "r", "s", "t",
            "u", "v", "w", "x", "y", "z"
        };
        ocr_dict_table.emplace_back("CTCBlank");
        for (size_t i = 0; i < std::extent<decltype(hardcodedOCRDict)>::value; ++i) {
            ocr_dict_table.emplace_back(hardcodedOCRDict[i]);
        } 
        ocr_dict_ready = true;
    }
    

    if (outputLayersInfo.size() != 3)
    {
        std::cerr << "Mismatch in the number of output buffers."
                  << "Expected 3 output buffers, detected in the network: "
                  << outputLayersInfo.size() << std::endl;
        return false;
    }

    auto layerFinder = [&outputLayersInfo](const std::string &name)
        -> const NvDsInferLayerInfo *{
        for (auto &layer : outputLayersInfo) {
            if (layer.layerName && name == layer.layerName) {
                return &layer;
            }
        }
        return nullptr;
    };

    const NvDsInferLayerInfo *output_id = layerFinder("output_id");
    const NvDsInferLayerInfo *output_prob = layerFinder("output_prob");
    const NvDsInferLayerInfo *_798 = layerFinder("798");


    if (!output_id || !output_prob || !_798 ) {
        if (!output_id) {
            std::cerr << "  - output_id: Missing or unsupported data type." << std::endl;
        }

        if (!output_prob) {
            std::cerr << "  - output_prob: Missing or unsupported data type." << std::endl;
        }

        if (!_798) {
            std::cerr << "  - 798: Missing or unsupported data type." << std::endl;
        }
        return false;
    }

    if(output_id->inferDims.numDims != 1U) {
        std::cerr << "Network output_id dims is : " <<
            output_id->inferDims.numDims << " expect is 1"<< std::endl;
        return false;
    }
    if(output_prob->inferDims.numDims != 1U) {
        std::cerr << "Network output_prob dims is : " <<
            output_prob->inferDims.numDims << " expect is 1"<< std::endl;
        return false;
    }
    if(_798->inferDims.numDims != 1U) {
        std::cerr << "Network 798 dims is : " <<
            _798->inferDims.numDims << " expect is 1"<< std::endl;
        return false;
    }

    int batch_size = 1;
    int output_len = output_prob->inferDims.d[0];

    //std::cout << "Batch size: " << batch_size << std::endl;
    //std::cout << "Output length: " << output_len << std::endl;
    //std::cout << "networkInfo.width: " << networkInfo.width << std::endl;
    
    std::vector<std::pair<std::string, float>> temp_de_texts;

    int *output_id_data = reinterpret_cast<int*>(output_id->buffer);
    float *output_prob_data = reinterpret_cast<float*>(output_prob->buffer);

    for(int batch_idx = 0; batch_idx < batch_size; ++batch_idx)
        {
            int b_offset = batch_idx * output_len; 
            int prev = output_id_data[b_offset];
            std::vector<int> temp_seq_id = {prev};
            std::vector<float> temp_seq_prob = {output_prob_data[b_offset]};
            for(int i = 1 ; i < output_len; ++i)
            {
                if (output_id_data[b_offset + i] != prev)
                {
                    temp_seq_id.push_back(output_id_data[b_offset + i]);
                    temp_seq_prob.push_back(output_prob_data[b_offset + i]);
                    prev = output_id_data[b_offset + i];
                }
            }
            std::string de_text = "";
            float prob = 1.0;
            for(size_t i = 0; i < temp_seq_id.size(); ++i)
            {
                if (temp_seq_id[i] != 0)
                {
                    if (temp_seq_id[i] <= static_cast<int>(ocr_dict_table.size()) - 1)
                    {
                        de_text += ocr_dict_table[temp_seq_id[i]];
                        prob *= temp_seq_prob[i];
                    }
                    else
                    {
                        std::cerr << "[ERROR] Character dict is not compatible with OCRNet TRT engine." << std::endl;
                    }
                }
            }
            temp_de_texts.emplace_back(std::make_pair(de_text, prob));
        }

    attrString = "";
    for (const auto& temp_text : temp_de_texts) {
        if (temp_text.second >= classifierThreshold) {
            attrString += temp_text.first;
        }
        //std::cout << "Decoded text: " << temp_text.first << ", Probability: " << temp_text.second <<  ", Threshold: " << classifierThreshold << std::endl;
    }

    OCR_attr.attributeIndex = 0;
    OCR_attr.attributeValue = 1;
    OCR_attr.attributeLabel = strdup(attrString.c_str()); 
    OCR_attr.attributeConfidence = 1.0;
    
    for (const auto& temp_text : temp_de_texts) {
        OCR_attr.attributeConfidence *= temp_text.second;
    }

    // std::cout << "attributeIndex: " << OCR_attr.attributeIndex << std::endl;
    // std::cout << "attributeValue: " << OCR_attr.attributeValue << std::endl;
    // std::cout << "attributeLabel: " << OCR_attr.attributeLabel << std::endl;
    // std::cout << "attributeConfidence: " << OCR_attr.attributeConfidence << std::endl;

    attrList.push_back(OCR_attr);

    return true;
}

CHECK_CUSTOM_CLASSIFIER_PARSE_FUNC_PROTOTYPE(NvDsInferParseOCRNetCTC);

Please refer to deepstream_lpr_app/deepstream-lpr-app/deepstream_lpr_app.c at master · NVIDIA-AI-IOT/deepstream_lpr_app (github.com) for how t get the classifier’s label.

I don’t understand how to access pyds.NvDsClassifierMeta if pyds.NvDsFrameMeta.obj_meta_list is None.

    std::cout << "attributeIndex: " << OCR_attr.attributeIndex << std::endl;
    std::cout << "attributeValue: " << OCR_attr.attributeValue << std::endl;
    std::cout << "attributeLabel: " << OCR_attr.attributeLabel << std::endl;
    std::cout << "attributeConfidence: " << OCR_attr.attributeConfidence << std::endl;

    attrList.push_back(OCR_attr);

Return:

attributeIndex: 0
attributeValue: 1
attributeLabel: preset2
attributeConfidence: 0.994145

attributeIndex: 0
attributeValue: 1
attributeLabel: preset2
attributeConfidence: 0.996097

attributeIndex: 0
attributeValue: 1
attributeLabel: preset2
attributeConfidence: 0.995123

attributeIndex: 0
attributeValue: 1
attributeLabel: preset2
attributeConfidence: 0.992203
while l_frame:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break

        frame_number=frame_meta.frame_num
        l_obj=frame_meta.obj_meta_list
        num_rects = frame_meta.num_obj_meta
        ndfm_pad_index = frame_meta.pad_index
        print(num_rects, l_obj )

Return:

0 NONE
0 NONE
0 NONE
0 NONE

Please refer to the attached deepstream-test2 which is based on deepstream_python_apps/apps/deepstream-test2/deepstream_test_2.py at master · NVIDIA-AI-IOT/deepstream_python_apps (github.com)

deepstream_test_2.py (15.9 KB)

We are experiencing a communication or understanding issue regarding the problem. The main issue is that frame_meta.obj_meta_list returns nothing.
Despite the setup and processing working correctly up to PGIE, frame_meta.obj_meta_list appears as None
Could you please assist in resolving why frame_meta.obj_meta_list is returning None?

l_obj=frame_meta.obj_meta_list
while l_obj is not None:

I use Gst-nvdspreprocess to send only the text area for OCR. Everything works perfectly up to the PGIE, but I can’t retrieve the data in the probe because frame_meta.obj_meta_list appears as NONE.

def sgie_pad_buffer_probe(pad,info,u_data):
    gst_buffer = info.get_buffer()
    if not gst_buffer:
        print("Unable to get GstBuffer ")
        return
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            # Note that l_frame.data needs a cast to pyds.NvDsFrameMeta
            # The casting is done by pyds.NvDsFrameMeta.cast()
            # The casting also keeps ownership of the underlying memory
            # in the C code, so the Python garbage collector will leave
            # it alone.
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
        l_obj=frame_meta.obj_meta_list
        while l_obj is not None:
            try:
                # Casting l_obj.data to pyds.NvDsObjectMeta
                obj_meta=pyds.NvDsObjectMeta.cast(l_obj.data)
            except StopIteration:
                break
            print('**@@## obj_meta found class ',obj_meta.class_id)
            l_objclass=obj_meta.classifier_meta_list
            while l_objclass is not None:
                try:
                    classifier_meta=pyds.NvDsClassifierMeta.cast(l_objclass.data)
                except StopIteration:
                    break
                l_labelinfo=classifier_meta.label_info_list
                label_num=0
                while l_labelinfo is not None:
                    try:
                        label_info=pyds.NvDsLabelInfo.cast(l_labelinfo.data)
                        print('%%###**label info ', label_info.result_label, ' num ', label_num)
                    except StopIteration:
                        break
                    label_num=label_num+1
                    l_labelinfo=l_labelinfo.next
                try:
                    l_objclass=l_objclass.next
                except StopIteration:
                    break
            try: 
                l_obj=l_obj.next
            except StopIteration:
                break
        try:
            l_frame=l_frame.next
        except StopIteration:
            break

    return Gst.PadProbeReturn.OK

As to your dscription, you have attached the classifier labels in the postprocessing but there is no object meta in the frame meta. Can you refer to NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream (github.com)? In this sample, the LPR model is a SGIE classifier with customized postprocessing, we can get the object meta and label in the app. Please check whether your pipeline is correct and the object meta reading code has been put to the right place.

I have successfully implemented (LPR) in my environment, and it works well. However, I’m facing difficulties achieving the same result with OCR, even though the configuration is quite similar.

In the case of LPR, the inference is executed as a secondary inference, while for OCR, the classification inference is executed as a primary inference. Could this be causing the issue?

I have made the code available on GitHub for reference.

It is OK to set the classifier as PGIE. There should be one object in the frame meta.

Please check the " num_obj_meta" value in the frame meta.

This is problem num_obj_meta return 0 .

There is classifier PGIE sample in deepstream_tao_apps/apps/tao_classifier at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

I have successfully implemented PGIE with inference as a classifier, but I am encountering an issue where the data is not available in pyds.NvDsObjectMeta. I can only access the classifier metadata (classifier_meta_list) if it is available within pyds.NvDsObjectMeta.

I suspect the problem might be related to the NvDsInferParseOCRNetCTC function, which is supposed to send data to the pipeline. However, I implemented it following NVIDIA’s examples, so I need support to verify if this implementation is correct.

This issue is impacting our project, and finding a resolution is crucial. We have invested significantly in NVIDIA GPUs and shared all the relevant code to help diagnose the problem, but despite our efforts outlined in this post, we have not been able to make progress in resolving the issue.

The provided examples do not solve the issue.

Could you please provide guidance or suggest an alternative support channel where I can address this problem?

You’ve got the labels according to this log.

yes. I have enabled this debug to trace data.

my pipeline output:

>> attribute* output comes from parse lib c++

attributeIndex: 0
attributeValue: 1
attributeLabel: eresels
attributeConfidence: 0.28291
47 0 None  >> this comes from pipeline.py print(frame_number, num_rects,l_obj)
attributeIndex: 0
attributeValue: 1
attributeLabel: eresel
attributeConfidence: 0.673613
48 0 None
attributeIndex: 0
attributeValue: 1
attributeLabel: eresels
attributeConfidence: 0.418706