Parsing output issue

Yes it looks some improvements are there.
My second pgie’s input batch is supposed to be dynamic with max batch size is 10.
When updated according to your suggestions, the following are my observations.

Test with deepstream-app.
The outputs are

KLT Tracker Init
mInputDims from enqueue 88 10 48
mOutputDims from enqueue 10 20
before print
26 20 29 19 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 20 29 19 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 20 26 20 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 20 26 20 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 20 26 20 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 11 14 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 11 14 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 3 14 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 -1 -1 -1 
dims 20
decoded as 020202020
dims 20
decoded as 0
dims 20
decoded as 020202020
dims 20
decoded as 0
dims 20
decoded as 020202020
dims 20
decoded as 0
dims 20
decoded as 020202020
dims 20
decoded as 0
dims 20
decoded as 020202020
dims 20
decoded as 0
mInputDims from enqueue 88 10 48
mOutputDims from enqueue 10 20
before print
26 11 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 14 -1 -1 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 8 11 11 11 11 11 11 14 -1 -1 -1 -1 -1 -1 26 11 14 11 14 11 11 11 11 11 11 11 14 -1 -1 -1 -1 -1 -1 -1 26 19 26 17 5 5 5 9 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 19 17 9 17 5 9 9 16 5 9 9 -1 -1 -1 -1 -1 -1 -1 -1 26 19 26 6 9 6 9 26 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 
dims 20
decoded as 02020202020202020202
dims 20
decoded as 02020
dims 20
decoded as 02020202020202020202
dims 20
decoded as 02020
dims 20
decoded as 02020202020202020202
dims 20
decoded as 02020
dims 20
decoded as 02020202020202020202
dims 20
decoded as 02020
dims 20
decoded as 02020202020202020202
dims 20
decoded as 02020
mInputDims from enqueue 88 10 48
mOutputDims from enqueue 10 20
before print
26 17 5 5 6 9 5 6 9 17 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 19 26 9 5 5 9 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 17 26 9 9 5 5 26 17 9 6 8 9 -1 -1 -1 -1 -1 -1 -1 26 17 26 9 9 5 5 26 17 9 6 8 9 -1 -1 -1 -1 -1 -1 -1 26 19 17 8 6 6 17 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 19 17 17 5 9 5 9 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 19 17 9 9 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 19 26 6 9 6 9 26 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 17 26 6 5 9 17 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 17 26 9 9 5 5 26 17 9 6 8 9 -1 -1 -1 -1 -1 -1 -1 
dims 20
decoded as 02020202020202020202
dims 20
decoded as 0
dims 20
decoded as 020202020202020
dims 20
decoded as 0
dims 20
decoded as 02020202020202020202
dims 20
decoded as 0202020
dims 20
decoded as 02020202020202020202
dims 20
decoded as 0202020
dims 20
decoded as 020202020202020
dims 20
decoded as 0
mInputDims from enqueue 88 2 48
mOutputDims from enqueue 2 20
before print
26 19 26 17 9 5 6 9 5 9 9 16 -1 -1 -1 -1 -1 -1 -1 -1 26 19 26 17 9 5 6 9 5 9 9 16 -1 -1 -1 -1 -1 -1 -1 -1 
dims 20
decoded as 02020202020202020202
dims 20
decoded as 02020
q
Quitting

Observations
With earlier setting batch-size=1, the program freezed and I need to reset AGX Xavier.
After setting batch-size=10, now is program can run till Q-key is pressed. So that is improvement.

Primary pgie’s detection output is not 10.
Sometimes 3 sometimes 4,5,6 etc. But never reached to 10.
(1)Why batch size to custom plugin is 10?
Print from custom plugin
mInputDims from enqueue 88 10 48
mOutputDims from enqueue 10 20

26 20 29 19 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 20 29 19 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 20 26 20 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 20 26 20 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 20 26 20 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 11 14 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 11 14 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 3 14 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 -1 -1 -1

It looks repetition of inputs to sgie.

When tested with deepstream-test2 with pgie’s batch-size=10, batch size is correct. So dynamic batch size changed according to number of detection at pgie. But still have error at custom plugin CDCGreedyDecoder’s outputs. 1st is correct, 2nd and 3rd are wrong.
mInputDims from enqueue 88 3 48
mOutputDims from enqueue 3 20
before print
26 20 26 26 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 11 26 11 11 11 11 11 11 11 11 11 14 -1 -1 -1 -1 -1 -1 -1 26 19 26 17 9 8 17 5 9 5 4 17 9 26 -1 -1 -1 -1 -1 -1
dims 20
decoded as 020202020
dims 20
decoded as 0
dims 20
decoded as 02020202020202020202
Input at custom parser also wrong, not same as CDCGreedyDecoder’s output.

Hi,

Let’s focus on the deepstream-app first.

Not sure if we understand your problem correctly.
The pipeline between Deepstream and TensorRT seems different.

Deepstream: image → detection(primary) → classification(secondary) → plugin → output
TensorRT: image → classification → plugin → output

If the above is correct, then the result is expected.
In the default deepstream pipeline, all the image will pass into the primary-gie first.
Deepstream will integrate 20 images into a batch and feed into the detector at the same time.

After that, deepstream will extract the ROI region based on the detected bounding box and feed it into the classifier.
As a result, based on the detection output, the number of ROI regions will be different.

If you want to feed the 20 image into the classifier without cropping, please use primary-gie component and update it into the classifer directly.

Thanks.

My expectation and requirement for Deepstream is
Deepstream : image → detection(primary) ->crop bounding boxes-> classification(secondary) + plugin → output

pgie will always process only one image.
pgie detection will detect and crop bounding boxes. Then bounding boxes are fed to text classifier.
How many bounding boxes are fed depends on how many object are detected in pgie.
That is why I need classifier to accept dynamic batch.
Custom plugin is the last output layer of classifier.

For TensorRT, yes cropped bounding boxes are directly fed to classifier.

Any thing I need to modify?

Hi,

pgie can accept multiple images. This is controlled via the batchsize parameter.

For sgie, as you already know, the batch is formatted based on the number of detected bbox.
So you may get some dummy output at batch N if N>#bbox.

Thanks.

How about I am not getting correct same data as CDCGreedyDecoder output at customised parser?
How about test2 results are not correct?

My output from CDCGreedyDecoder is arragned as below.
Batches output are arranged one after another.
| 0…result0…19 | 0…result1…19 | 0…result2…19 | 0…result3…19 |

For the custom parser below, the parser is called for one image by one image with outputLayersInfo.size() is 20.

extern "C"
bool NvDsInferParseCustomCTCGreedy (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
        NvDsInferNetworkInfo  const &networkInfo,
        float classifierThreshold,
        std::vector<NvDsInferAttribute> &attrList,
        std::string &descString)
{
   for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
       if (strcmp(outputLayersInfo[i].layerName, "d_predictions:0") == 0) {
          NvDsInferDimsCHW dims;
          getDimsCHWFromDims(dims, outputLayersInfo[i].inferDims);
          std::cerr << "dims " << dims.c << std::endl;
          //std::vector<char> str;
          float* data = (float *) outputLayersInfo[i].buffer;
          for(unsigned int d = 0; d < dims.c ; d++){
              std::cerr << (int)*(data+d) << " ";
              //if(*(data+d) < 0)
              //   break;
              //str.push_back(decode[(int)*(data+d)]);
          }
          std::cerr << std::endl;
          //std::string s(str.begin(), str.end());
          //std::cerr << "decoded as " << s << std::endl;
          //std::vector<char>().swap(str);
       }
    }
    return true;  

}

For 10 batch size, the outputs are as follows.
20 is outputLayersInfo.size(). Why the outputs are different from CDCGreedyDecoder outputs?

dims 20
0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 
dims 20
0 2 0 2 0 2 0 -1 0 -1 0 -1 0 -1 0 -1 0 -1 0 -1 
dims 20
0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 
dims 20
0 2 0 2 0 2 0 2 0 2 0 -1 0 -1 0 -1 0 -1 0 -1 
dims 20
0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 
dims 20
0 2 0 2 0 2 0 2 0 2 0 -1 0 -1 0 -1 0 -1 0 -1 
dims 20
0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 
dims 20
0 2 0 2 0 2 0 2 0 2 0 -1 0 -1 0 -1 0 -1 0 -1 
dims 20
0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 0 2 
dims 20
0 2 0 2 0 2 0 2 0 2 0 2 0 -1 0 -1 0 -1 0 -1

We need to sort out.

1 Like

I think you need check post

I made the program work.
For this one, I don’t need parser.
The output is taken from sgie_pad_buffer_probe (GstPad * pad, GstPadProbeInfo * info, gpointer u_data)

The buffer data is taken from here.

float *outputCoverageBuffer =
            (float *) meta->output_layers_info[0].buffer;

Same data as I saw in custom parser buffer.

mInputDims from enqueue 88 3 48
mOutputDims from enqueue 3 20
Could not find bbox layer buffer while parsing
Could not find bbox layer buffer while parsing
before print
26 20 29 26 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 26 11 11 11 11 11 11 11 11 11 11 11 -1 -1 -1 -1 -1 -1 -1 -1 26 19 26 17 5 5 5 9 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 

Only first data is correct. Second and third are wrong.

I suspect sgie is not getting correct crop image from pgie. Where should I check?

When I print meta->output_layers_info[0] in sgie_pad_buffer_probe.

p  meta->output_layers_info[0]
$4 = {dataType = FLOAT, {inferDims = {numDims = 1, d = {20, 0, 0, 0, 0, 0, 0, 
        0}, numElements = 20}, dims = {numDims = 1, d = {20, 0, 0, 0, 0, 0, 0, 
        0}, numElements = 20}}, bindingIndex = 1, 
  layerName = 0x5566fa9e98 "d_predictions:0", buffer = 0x20da9c200, 
  isInput = 0}

Is it the way the output arranged in CDCGreedyDecoder doesn’t match to deepstream requirement?

Hi,

Thanks for keeping us updated.

We are trying to reproduce this issue and finding a possible cause with the data shared from private message.
This may take some time but we will give you updated.

Thanks.

Thank you. May I know how long to take? So that I can manage my customer on this project.

Hi,

Sorry that this is issue-dependent.
But we will keep you updated once we have any progress.

Thanks.

Hi,

Thanks for your patience.

We try to reproduce this issue but get stuck in the plugin library.
For CTCGreedyDecoder layer, it requires a TensorRT plugin, extended onnx parser and a custom output parser for deepstream sgie .

However, we can only find the CTCGreedyDecoder_Plugin implementation but no extended onnx exporter in the source code.
This leads to the non-registered error from both TensorRT and Deepstream.

While parsing node number 61 [CTCGreedyDecoder]:
ERROR: /home/nvidia/TensorRT/parsers/onnx/ModelImporter.cpp:134 In function parseGraph:
[8] No importer registered for op: CTCGreedyDecoder
ERROR: Failed to parse onnx file
ERROR: failed to build network since parsing model errors.
ERROR: failed to build network.

Have you add the CTCGreedyDecoder layer support into the onnx parser yet?

Thanks.

The folder I passed have all inside. It is inside /Deepstream/CTCGreedyDecoder_Plugin/plugin/CTCGreedyDecoder.cpp. The so file is /usr/src/tensorrt/CTCGreedyDecoder_Plugin/build/libCTCGreedyDecoder.so.
You may need to rebuild again.
I have linked to deepstream code in the config file as

custom-lib-path=/usr/src/tensorrt/CTCGreedyDecoder_Plugin/build/libCTCGreedyDecoder.so

CTCGreedyDecoder.cpp has both customer parser and CTCGreedyDecoder inside.

The following code is for custom parser for deepstream and the rest are for CTCGreedyDecoder.

extern "C"
bool NvDsInferParseCustomCTCGreedy (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
        NvDsInferNetworkInfo  const &networkInfo,
        float classifierThreshold,
        std::vector<NvDsInferAttribute> &attrList,
        std::string &descString)
{
   for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
       if (strcmp(outputLayersInfo[i].layerName, "d_predictions:0") == 0) {
          NvDsInferDimsCHW dims;
          getDimsCHWFromDims(dims, outputLayersInfo[i].inferDims);
          std::cerr << "dims " << dims.c << std::endl;
          //std::vector<char> str;
          float* data = (float *) outputLayersInfo[i].buffer;
          for(unsigned int d = 0; d < dims.c ; d++){
              std::cerr << (int)*(data+d) << " ";
              //if(*(data+d) < 0)
              //   break;
              //str.push_back(decode[(int)*(data+d)]);
          }
          std::cerr << std::endl;
          //std::string s(str.begin(), str.end());
          //std::cerr << "decoded as " << s << std::endl;
          //std::vector<char>().swap(str);
       }
    }
    return true;  

}
CHECK_CUSTOM_CLASSIFIER_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomCTCGreedy);

Thanks Brother. Please let me know if needs some updates from my side.

Hi,

Since the model is in the ONNX format, you will need to add the CTCGreedyDecoder layer into the ONNX parser for parsing.

Do you have add the parsing into the ONNX parser?
If not, how do you convert your model into TensorRT .plan?

Thanks.

No I didn’t add into ONNX model. That CDCGreedyDecode plugin is for TensorRT and so added into TensorRT.
For Deepstream, do I need to add in plugin into ONNX model? Not to Tensorrt?

Hi,

You will need to add the plugin into onnx parser for both TensorRT and Deepstream use case.

The workflow of TensorRT looks like this:

ONNX → (onnx2trt) → TensorRT PLAN → (TensorRT inference)

Since your model is in onnx format, you will need to add the plugin to both TensorRT and onnx parser.

Have you inferenced the model with pure TensorRT before?
If yes, could you share some details about the procedure?

Thanks.

I see thanks. That is good news for me.
The zip file provided to you has TensorRT folder. Inside has implementation of sgie in TensorRT.
Images to test are also provided. Please let me know if you need clarification.
I’ll try to add CDCGreedyDecoder into ONNX.

The procedure is just create plugin and register plugin to TensorRT.
When compile, TensorRT knows where to find that CDCGreedyDecoder plugin.
I thought the same thing happens for Deepstream, since Deepstream has TensorRT as background. Isn’t it?

Hi @AastaLLL,
Can you please confirm I need to add in CDCGreedyDecoder into ONNX, not into TensorRT in order to use Deepstream?

Hi,

As you know, Deepstream use TensorRT as the inference backend.
The limitation do come from TensorRT but causes error in both TensorRT and Deepstream.

Please noticed that you don’t need to implement the plugin to onnx but onnx parser.
More precisely, you will need to write the mapping (parameter, tensor, plugin, …) in the below file:

https://github.com/onnx/onnx-tensorrt/blob/eb559b6cdd1ec2169d64c0112fab9b564d8d503b/builtin_op_importers.cpp

You can follow the implementation for InstanceNormalization.
It mainly parses the definition in ONNX and feeds it into the TensorRT plugin.
https://github.com/onnx/onnx-tensorrt/blob/eb559b6cdd1ec2169d64c0112fab9b564d8d503b/builtin_op_importers.cpp#L1635

DEFINE_BUILTIN_OP_IMPORTER(InstanceNormalization)
{
    // Scales and biases must be initializers
    ASSERT(inputs.at(1).is_weights(), ErrorCode::kUNSUPPORTED_NODE);
    ASSERT(inputs.at(2).is_weights(), ErrorCode::kUNSUPPORTED_NODE);
    nvinfer1::ITensor* tensorPtr = &convertToTensor(inputs.at(0), ctx);
    int nbDims = tensorPtr->getDimensions().nbDims;
    ASSERT(nbDims >= 3 && nbDims <= 4 && "TensorRT only supports InstanceNormalization on 3D or 4D tensors!",
        ErrorCode::kUNSUPPORTED_NODE);
    auto scale_weights = inputs.at(1).weights();
    auto bias_weights = inputs.at(2).weights();
    OnnxAttrs attrs(node, ctx);
    float epsilon = attrs.get("epsilon", 1e-5f);

    // Populate instanceNormalization plugin properties.
    const std::string pluginName = "InstanceNormalization_TRT";
    const std::string pluginVersion = "1";
    std::vector<nvinfer1::PluginField> f;
    f.emplace_back("epsilon", &epsilon, nvinfer1::PluginFieldType::kFLOAT32, 1);
    f.emplace_back("scales", scale_weights.values, nvinfer1::PluginFieldType::kFLOAT32, scale_weights.count());
    f.emplace_back("bias", bias_weights.values, nvinfer1::PluginFieldType::kFLOAT32, bias_weights.count());

    // Create plugin from registry
    nvinfer1::IPluginV2* plugin = createPlugin(node.name(), importPluginCreator(pluginName, pluginVersion), f);

    ASSERT(plugin != nullptr && "InstanceNormalization plugin was not found in the plugin registry!",
        ErrorCode::kUNSUPPORTED_NODE);

    auto* layer = ctx->network()->addPluginV2(&tensorPtr, 1, *plugin);
    ctx->registerLayer(layer, node.name());
    RETURN_FIRST_OUTPUT(layer);
}

Thanks.

I c thanks. I need to work on onnx parser to parse plugin to TensorRT.