Parsing output with custom parser has error in decoding

My network has custom layer for CTCGreedyDecoder.
I have successfully implemented for plugin and tested in TensorRT, all succeeded.
I can have correct results.

I like to use Deepstream and implemented all requirements.
Deepstream needs custom parser.

CTCGreedyDecoder is last layer and output is decoded in parser.
How CTCGreedyDecoder works is if I have batch size 4, input and output sizes at CTCGreedyDecoder are
input size : 88 4 48
output size: 4 20
At CTCGreedyDecoder, output is arranged in one buffer array (4*20 lenght) as
|0…result1…19|0…result1…19|0…result2…19|0…result3…19|

In TensorRT, output is parsed one after another.

In DeepStream, same custom plugin is used. I don’t see correct results.
Still can see correct dimensions at CTCGreedyDecoder input and output.
But parsing output part, the following functions are called according to batch size.
If batch size is 4, called 4 times. If batch size is 3, called 3 times.

nvdsinfer_context_impl.cpp (815:823)
NvDsInferStatus
ClassifyPostprocessor::parseEachBatch(
    const std::vector<NvDsInferLayerInfo>& outputLayers,
    NvDsInferFrameOutput& result)
{    
    result.outputType = NvDsInferNetworkType_Classifier;
    fillClassificationOutput(outputLayers, result.classificationOutput);
    return NVDSINFER_SUCCESS;
}

nvdsinfer_context_impl_output_parsing.cpp(782:826)
NvDsInferStatus
ClassifyPostprocessor::fillClassificationOutput(
    const std::vector<NvDsInferLayerInfo>& outputLayers,
    NvDsInferClassificationOutput& output)
{
    string attrString;
    vector<NvDsInferAttribute> attributes;
    /* Call custom parsing function if specified otherwise use the one
     * written along with this implementation. */
    if (m_CustomClassifierParseFunc)
    {
        
        //std::cout << "Inside m_CustomClassifierParseFunc" << std::endl;
        if (!m_CustomClassifierParseFunc(outputLayers, m_NetworkInfo,
                m_ClassifierThreshold, attributes, attrString))
        {
            printError("Failed to parse classification attributes using "
                    "custom parse function");
            return NVDSINFER_CUSTOM_LIB_FAILED;
        }
    }
    else
    {
        
        if (!parseAttributesFromSoftmaxLayers(outputLayers, m_NetworkInfo,
                m_ClassifierThreshold, attributes, attrString))
        {
            printError("Failed to parse bboxes");
            return NVDSINFER_OUTPUT_PARSING_FAILED;
        }
    }

    /* Fill the output structure with the parsed attributes. */
    output.label = strdup(attrString.c_str());
    output.numAttributes = attributes.size();
    output.attributes = new NvDsInferAttribute[output.numAttributes];
    for (size_t i = 0; i < output.numAttributes; i++)
    {
        output.attributes[i].attributeIndex = attributes[i].attributeIndex;
        output.attributes[i].attributeValue = attributes[i].attributeValue;
        output.attributes[i].attributeConfidence = attributes[i].attributeConfidence;
        output.attributes[i].attributeLabel = attributes[i].attributeLabel;
    }
    return NVDSINFER_SUCCESS;
}

That means parsing is not in batch and cut into separate output. So output buffer from CTCGreedyDecoder is also cut according to number of images in batch.

The following function is in custom parser.

extern "C"
bool NvDsInferParseCustomCTCGreedy (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
        NvDsInferNetworkInfo  const &networkInfo,
        float classifierThreshold,
        std::vector<NvDsInferAttribute> &attrList,
        std::string &descString)
{
   for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
       if (strcmp(outputLayersInfo[i].layerName, "d_predictions:0") == 0) {
          NvDsInferDimsCHW dims;
          getDimsCHWFromDims(dims, outputLayersInfo[i].inferDims);
          std::cerr << "dims " << dims.c << std::endl;
          std::vector<char> str;
          float* data = (float *) outputLayersInfo[i].buffer;
          for(unsigned int d = 0; d < dims.c ; d++){
              //std::cerr << (int)*(data+d) << std::endl;
              //if(*(data+d) < 0)
              //   break;
              str.push_back(decode[(int)*(data+d)]);
          }
          std::string s(str.begin(), str.end());
          std::cerr << "decoded as " << s << std::endl;
          std::vector<char>().swap(str);
       }
    }
    return true;  

}
CHECK_CUSTOM_CLASSIFIER_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomCTCGreedy);

outputLayersInfo.size() is always 1 and dims.c,dims.w,dims.h are 20,0,0

So output buffer from CTCGreedyDecoder is cut into different segments.
When I parse output buffer of 20 length, there is no correct results.

My sgie is as follows.

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
onnx-file=../../../../samples/models/platerect/numplate_recg_nhwc_removed_sparsetodense.onnx
model-engine-file=../../../../samples/models/platerect/numplate_recg_nhwc_removed_sparsetodense.onnx_batch_max10_gpu0_fp16.engine
#mean-file=../../../../samples/models/Secondary_CarColor/mean.ppm
labelfile-path=../../../../samples/models/platerect/labels.txt
#int8-calib-file=../../../../samples/models/Secondary_CarColor/cal_trt.bin
infer-dims=24;94;3
force-implicit-batch-dim=0
batch-size=10
# 0=FP32 and 1=INT8 mode
network-mode=2
input-object-min-width=20
input-object-min-height=10
process-mode=2
model-color-format=0
gpu-id=0
gie-unique-id=2
operate-on-gie-id=1
operate-on-class-ids=1
network-type=1
parse-classifier-func-name=NvDsInferParseCustomCTCGreedy
#parse-bbox-func-name=NvDsInferParseCustomCTCGreedy
custom-lib-path=/usr/src/tensorrt/CTCGreedyDecoder_Plugin/build/libCTCGreedyDecoder.so
output-blob-names=d_predictions:0
classifier-threshold = 0

I can successfully implemented in TensorRT, but failed in Deepstream, what could be the problem?
Where should I look at? Can someone suggest?

No matter how input config parameters are changed, output results are not changed.

net-scale-factor=0.0039215697906911373
classifier-threshold = 0
offsets=128.0;128.0;128.0

output-blob-names=test:0

I am going to test gst_nvinfer_raw_output_generated_callback. Any example, how to register callback?

Hi,

Please share your environment information first.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Are you using the NHWC or NCHW input data format?
Please noticed that Deepstream to be used for a multimedia pipeline.
So there are some assumptions on the input of infererence.

You can find the detail in /opt/nvidia/deepstream/deepstream-5.0/sources/libs/nvdsinfer/nvdsinfer_context_impl.cpp file:

NvDsInferStatus InferPreprocessor::transform(
    NvDsInferContextBatchInput& batchInput, void* devBuf,
    CudaStream& mainStream, CudaEvent* waitingEvent)
{

...

}

Thanks.

Hi Thanks, I am using AGX Xvier, Jetpack4.4, Deepstream 5.0, TensorRT7.1, GPU CUDA10.2.

The issue is parsing custom output.

My network has
CTCGreedyDecoder as last layer and output is decoded in parser.
How CTCGreedyDecoder works is if I have batch size 4, input and output sizes at CTCGreedyDecoder are
input size : 88 4 48
output size: 4 20
At CTCGreedyDecoder, output is arranged in one buffer array (4*20 lenght) as
| 0…result1…19 | 0…result1…19 | 0…result2…19 | 0…result3…19 |

I can parse successfully at TensorRT taking one after another as in the output sequence.

But it didn’t work same in Deepstream.
I used NHWC, same format is used in TensorRT.

This is my parser in Deepstream.

extern "C"
bool NvDsInferParseCustomCTCGreedy (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
        NvDsInferNetworkInfo  const &networkInfo,
        float classifierThreshold,
        std::vector<NvDsInferAttribute> &attrList,
        std::string &descString)
{
   for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
       if (strcmp(outputLayersInfo[i].layerName, "d_predictions:0") == 0) {
          NvDsInferDimsCHW dims;
          getDimsCHWFromDims(dims, outputLayersInfo[i].inferDims);
          std::cerr << "dims " << dims.c << std::endl;
          std::vector<char> str;
          float* data = (float *) outputLayersInfo[i].buffer;
          for(unsigned int d = 0; d < dims.c ; d++){
              //std::cerr << (int)*(data+d) << std::endl;
              //if(*(data+d) < 0)
              //   break;
              str.push_back(decode[(int)*(data+d)]);
          }
          std::string s(str.begin(), str.end());
          std::cerr << "decoded as " << s << std::endl;
          std::vector<char>().swap(str);
       }
    }
    return true;  

}
CHECK_CUSTOM_CLASSIFIER_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomCTCGreedy);

Even though, CDCGreedyDecoder has all outputs in one buffer, but parsing is called one by one with 20 -element size buffer.
Then I don’t get correct output?

I think I am not getting right output buffer in parsing function. So output data is always wrong.

Can I send my code so that you can reproduce the issue?

Any suggestions on why I can’t parse correct output?

Can I send my code so that you can reproduce the issue?

Yes.
Please include the TensorRT working source and the Deepstream not working source as well.
And you can send it through the private message and update a comment here.

Thanks.

I have sent all source codes and required test data. Thanks.

please check Parsing output issue for update.