My network has custom layer for CTCGreedyDecoder.
I have successfully implemented for plugin and tested in TensorRT, all succeeded.
I can have correct results.
I like to use Deepstream and implemented all requirements.
Deepstream needs custom parser.
CTCGreedyDecoder is last layer and output is decoded in parser.
How CTCGreedyDecoder works is if I have batch size 4, input and output sizes at CTCGreedyDecoder are
input size : 88 4 48
output size: 4 20
At CTCGreedyDecoder, output is arranged in one buffer array (4*20 lenght) as
|0…result1…19|0…result1…19|0…result2…19|0…result3…19|
In TensorRT, output is parsed one after another.
In DeepStream, same custom plugin is used. I don’t see correct results.
Still can see correct dimensions at CTCGreedyDecoder input and output.
But parsing output part, the following functions are called according to batch size.
If batch size is 4, called 4 times. If batch size is 3, called 3 times.
nvdsinfer_context_impl.cpp (815:823)
NvDsInferStatus
ClassifyPostprocessor::parseEachBatch(
const std::vector<NvDsInferLayerInfo>& outputLayers,
NvDsInferFrameOutput& result)
{
result.outputType = NvDsInferNetworkType_Classifier;
fillClassificationOutput(outputLayers, result.classificationOutput);
return NVDSINFER_SUCCESS;
}
nvdsinfer_context_impl_output_parsing.cpp(782:826)
NvDsInferStatus
ClassifyPostprocessor::fillClassificationOutput(
const std::vector<NvDsInferLayerInfo>& outputLayers,
NvDsInferClassificationOutput& output)
{
string attrString;
vector<NvDsInferAttribute> attributes;
/* Call custom parsing function if specified otherwise use the one
* written along with this implementation. */
if (m_CustomClassifierParseFunc)
{
//std::cout << "Inside m_CustomClassifierParseFunc" << std::endl;
if (!m_CustomClassifierParseFunc(outputLayers, m_NetworkInfo,
m_ClassifierThreshold, attributes, attrString))
{
printError("Failed to parse classification attributes using "
"custom parse function");
return NVDSINFER_CUSTOM_LIB_FAILED;
}
}
else
{
if (!parseAttributesFromSoftmaxLayers(outputLayers, m_NetworkInfo,
m_ClassifierThreshold, attributes, attrString))
{
printError("Failed to parse bboxes");
return NVDSINFER_OUTPUT_PARSING_FAILED;
}
}
/* Fill the output structure with the parsed attributes. */
output.label = strdup(attrString.c_str());
output.numAttributes = attributes.size();
output.attributes = new NvDsInferAttribute[output.numAttributes];
for (size_t i = 0; i < output.numAttributes; i++)
{
output.attributes[i].attributeIndex = attributes[i].attributeIndex;
output.attributes[i].attributeValue = attributes[i].attributeValue;
output.attributes[i].attributeConfidence = attributes[i].attributeConfidence;
output.attributes[i].attributeLabel = attributes[i].attributeLabel;
}
return NVDSINFER_SUCCESS;
}
That means parsing is not in batch and cut into separate output. So output buffer from CTCGreedyDecoder is also cut according to number of images in batch.
The following function is in custom parser.
extern "C"
bool NvDsInferParseCustomCTCGreedy (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
NvDsInferNetworkInfo const &networkInfo,
float classifierThreshold,
std::vector<NvDsInferAttribute> &attrList,
std::string &descString)
{
for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
if (strcmp(outputLayersInfo[i].layerName, "d_predictions:0") == 0) {
NvDsInferDimsCHW dims;
getDimsCHWFromDims(dims, outputLayersInfo[i].inferDims);
std::cerr << "dims " << dims.c << std::endl;
std::vector<char> str;
float* data = (float *) outputLayersInfo[i].buffer;
for(unsigned int d = 0; d < dims.c ; d++){
//std::cerr << (int)*(data+d) << std::endl;
//if(*(data+d) < 0)
// break;
str.push_back(decode[(int)*(data+d)]);
}
std::string s(str.begin(), str.end());
std::cerr << "decoded as " << s << std::endl;
std::vector<char>().swap(str);
}
}
return true;
}
CHECK_CUSTOM_CLASSIFIER_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomCTCGreedy);
outputLayersInfo.size() is always 1 and dims.c,dims.w,dims.h are 20,0,0
So output buffer from CTCGreedyDecoder is cut into different segments.
When I parse output buffer of 20 length, there is no correct results.
My sgie is as follows.
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
onnx-file=../../../../samples/models/platerect/numplate_recg_nhwc_removed_sparsetodense.onnx
model-engine-file=../../../../samples/models/platerect/numplate_recg_nhwc_removed_sparsetodense.onnx_batch_max10_gpu0_fp16.engine
#mean-file=../../../../samples/models/Secondary_CarColor/mean.ppm
labelfile-path=../../../../samples/models/platerect/labels.txt
#int8-calib-file=../../../../samples/models/Secondary_CarColor/cal_trt.bin
infer-dims=24;94;3
force-implicit-batch-dim=0
batch-size=10
# 0=FP32 and 1=INT8 mode
network-mode=2
input-object-min-width=20
input-object-min-height=10
process-mode=2
model-color-format=0
gpu-id=0
gie-unique-id=2
operate-on-gie-id=1
operate-on-class-ids=1
network-type=1
parse-classifier-func-name=NvDsInferParseCustomCTCGreedy
#parse-bbox-func-name=NvDsInferParseCustomCTCGreedy
custom-lib-path=/usr/src/tensorrt/CTCGreedyDecoder_Plugin/build/libCTCGreedyDecoder.so
output-blob-names=d_predictions:0
classifier-threshold = 0
I can successfully implemented in TensorRT, but failed in Deepstream, what could be the problem?
Where should I look at? Can someone suggest?